Posts Tagged ‘hard disk’

Rebuilding a Raid 1 array: Using mdadm to add the new drive back to the array

Friday, May 31st, 2013

Because of frequent disk failures that occur in my Mobile Desktop for reasons I’ve yet to conclusively identify, I decided early on to run two drives in a Raid 1 array. In this way, a single drive failure was easily overcome. A week or so ago one of my current drives started producing read errors and it was removed from the array by mdadm (linux’s multi-disk administrator). Today I’m adding the failed drive back to the array (currently only 1 active drive) and decided to post the commands I use so I can find them easily in the future. For more information on creating, administering, and repairing Raid arrays in linux, I suggest an article on howtoforge.com by Falco Timme.

As of right now, my Raid 1 array (md0) has one connected drive (really partition), /dev/sda1. The drive (partition) that I want to add back to the array is /dev/sdb1. Both drives are from Western Digital but sda is a 250 GB disk while sdb is a 320 GB disk. Because sdb was previously in the array, there’s nothing special that I’ll need to do to use it, but I will be limited to using only 250 GB of the space on that drive. Here is the output from some mdadm commands to clarify the picture:

ecellingsworth@MD1-LMDE ~ $ cat /proc/mdstat
 Personalities : [raid1]
 md0 : active raid1 sda1[3]
 244194841 blocks super 1.2 [2/1] [_U]
 ecellingsworth@MD1-LMDE ~ $ sudo mdadm --detail /dev/md0
 [sudo] password for ecellingsworth:
 /dev/md0:
 Version : 1.2
 Creation Time : Thu Sep 15 10:36:29 2011
 Raid Level : raid1
 Array Size : 244194841 (232.88 GiB 250.06 GB)
 Used Dev Size : 244194841 (232.88 GiB 250.06 GB)
 Raid Devices : 2
 Total Devices : 1
 Persistence : Superblock is persistent
Update Time : Fri May 31 11:41:27 2013
 State : active, degraded
 Active Devices : 1
 Working Devices : 1
 Failed Devices : 0
 Spare Devices : 0
Name : MD1-Ubuntu:0
 UUID : 7d0d271d:04fdd2c1:de65ca3f:4c375489
 Events : 1279820
Number   Major   Minor   RaidDevice State
 0       0        0        0      removed
 3       8        1        1      active sync   /dev/sda1

First I copy the partition table from the good drive to the failed drive. If you use any of these commands yourself, be sure you use the right drive identifier. If you switch sda and sdb, you will destroy the good drive.

ecellingsworth@MD1-LMDE ~ $ sudo sfdisk -d /dev/sda | sudo sfdisk --force /dev/sdb
 Checking that no-one is using this disk right now ...
 OK
Disk /dev/sdb: 38913 cylinders, 255 heads, 63 sectors/track
 Old situation:
 Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start     End   #cyls    #blocks   Id  System
 /dev/sdb1   *      0+  30400   30401- 244196001   fd  Linux raid autodetect
 /dev/sdb2          0       -       0          0    0  Empty
 /dev/sdb3          0       -       0          0    0  Empty
 /dev/sdb4          0       -       0          0    0  Empty
 New situation:
 Units = sectors of 512 bytes, counting from 0
Device Boot    Start       End   #sectors  Id  System
 /dev/sdb1   *        63 488392064  488392002  fd  Linux raid autodetect
 /dev/sdb2             0         -          0   0  Empty
 /dev/sdb3             0         -          0   0  Empty
 /dev/sdb4             0         -          0   0  Empty
 Successfully wrote the new partition table
Re-reading the partition table ...
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
 to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
 (See fdisk(8).)

Now I remove any traces of the previous Raid information from the failed drive (partition).

ecellingsworth@MD1-LMDE ~ $ sudo mdadm --zero-superblock /dev/sdb1

All that’s left to do is put /dev/sdb1 back into the array.

ecellingsworth@MD1-LMDE ~ $ sudo mdadm -a /dev/md0 /dev/sdb1
 mdadm: added /dev/sdb1

Mdadm will then resync the two drives by copying everything from /dev/sda1 to /dev/sdb1. This process takes three to four hours for my 250 GB drives and will be longer for larger drives. You can monitor the rebuilding process using the following ‘cat /proc/mdstat’. I like to run this command with ‘watch’ every few seconds to see the rebuilding progress in real time.

ecellingsworth@MD1-LMDE ~ $ watch -n 5 'cat /proc/mdstat'
Every 5.0s: cat /proc/mdstat                            Fri May 31 11:51:17 2013
Personalities : [raid1]
 md0 : active raid1 sdb1[2] sda1[3]
 244194841 blocks super 1.2 [2/1] [_U]
 [>....................]  recovery =  2.6% (6465024/244194841) finish=124.7
 min speed=31746K/sec
unused devices: <none>

Don’t trust the estimated time. The rebuilding process will run in the background, but if you are using the PC at the time, the rebuilding has to pause every time you need access to the file system. And, for some reason, rebuilding tends to slow way down near the end.

Edit 12/17/14:

I’ve used the above process to add back to the array a failing drive several times now. However, in my case, since I’m adding back the same drive that was previously in the array, it’s not necessary to copy the the partition table from the good drive nor is it necessary to zero the superblock on the failed drive. Instead, I use one command “mdadm -a” to add the drive back to the array and it rebuilds fine. This has the advantage that it’s much less risky. There’s no chance of destroying the good drive by wiping it’s partition table or superblock. And, I doubt anything would happen if you told mdadm to add a drive that’s already in the array, though I’m not going to test this theory myself.