Recovering from a disk failure with Linux-raid mirrored disks

by Mark Foster
March 13th, 2003

WARNING! This article is sorely outdated, please see this link for an update that is both current and useful!

This article presents the steps I used to recover from a hard drive failure while running software RAID1 (mirrored drives) on Red Hat Linux.

Environment

  • host1 was Red hat 7.1 on a Compaq Proliant ML370
  • 2 SCSI drives (/dev/sda, /dev/sdb)
  • 1 mirror device (/dev/md0) in a RAID1 configuration
  • lilo for the boot record
This guide should work fine for newer versions of Red Hat or Fedora Linux.

Situation

The first drive sda has failed and the software raid is running in "degraded" mode, as shown in the output of /proc/mdstat. There is one mirrored partition (sda5 -=- sdb5) and a swap partition on each drive (sda1 and sdb1).

Preparation

Read up on software RAID (see References below)

Since sdb was identical to sda, we can use it's partition table as reference.

You'll need a boot floppy - use mkbootdisk(8)

Examine and understand the output of the following commands:

sfdisk -l /dev/sdb
cat /etc/raidtab
cat /proc/mdstat

Process

Dump the partition table to a file suitable to feed back to sfdisk.
[root@host1 ~]# sfdisk -d /dev/sdb > /tmp/sdb.out
[root@host1 ~]# mv /tmp/sdb.out /tmp/sda.in
Edit the file, changing sdb to sda in the first column.
[root@host1 ~]# vi /tmp/sda.in
Now let's re-create the partition table on sda.
[root@host1 ~]# sfdisk /dev/sda < /tmp/sda.in
Add sda5 back into the mirror device md0.
[root@host1 ~]# raidhotadd /dev/md0 /dev/sda5
Check out the result in /proc/mdstat, you should see that recovery is underway.
[root@host1 ~]# cat /proc/mdstat
Reactivate our swap space.
[root@host1 ~]# mkswap /dev/sda2
[root@host1 ~]# swapon -a
Verify the swap partition is back online
[root@host1 ~]# swapon -s
Once recovery is complete, run lilo to recreate the MBR on /dev/sda
[root@host1 ~]# /sbin/lilo -v
Note that this will likely not work on Red Hat 7.2 and above since the default boot loader was made grub... but if you upgraded from 7.1 your system might still be using lilo.

Voila! Reboot if you like although this is not strictly necessary

[root@host1 ~]# reboot

Conclusion

The key to a successful recovery is keeping a level head and do not panic. The sfdisk -d command is wonderful for recreating the partition layout. It should prove just as useful for a RAID5 recovery, but I cannot attest to that personally.

If I was helpful, please take a moment to let me know.

References

man pages: mkbootdisk(8) sfdisk(8) swapon(8) mkswap(8) raidtab(5)

The Software-RAID HOW-TO
http://unthought.net/Software-RAID.HOWTO/
Penguin Magazine Article
http://www.penguinmagazine.com/Magazine/This_Issue/0011/1
Home-brew High Availability: Booting Linux from a RAID-1 Device
http://www.samag.com/documents/s=1155/sam0101g/0101g.htm


Head shot © 2003-2004 Mark Foster

If I was helpful, please let me know.