Thursday, May 26, 2011

Hot-removing a SATA drive, and copying partition tables

In a multi-disk setup, one sometimes needs to replace a drive. If one has a hot-swap bay, this is surprisingly easy, but before actually pulling the drive out, one MUST tell Linux that one is about to do so:

# echo 1 > /sys/block/sdX/device/delete

You can then pull the drive out and replace it with a new one. Note, though, that the new drive will generally get a new device name (e.g., /dev/sdZ), until the next reboot.

Now, say that you want the drive to have identical partitions to another drive (say, sdY) in your system, then you simple copy the partition tables:

# sfdisk -d /dev/sdY | sfdisk /dev/sdZ

This can all be done without ever rebooting the system :-)

Tuesday, May 24, 2011

A _seriously_ close shave with Linux software RAID

I am running a 4-disk home server that uses the first partition on each of the four drives as a single RAID6 array using Linux mdadm software RAID. As I freed up some partitions on the remainder of each of the drives, I wanted to extend the size of the first partition on each drive so that I could first grow the RAID6 array, and then grow the filesystem.

That sounded easy enough: my data drivers are /dev/sdb, /dev/sdc, /dev/sdd, and /dev/sde, so I simply first checked what the starting sector of each partition was, using:

# fdisk -c -u -l /dev/sdb (and for c, d, and e, too).

This showed that the first partition started on block 2048. Fair enough: I ran

# fdisk -c -u /dev/sdb

deleted (d) partition 1, created a new partition 1 with a new end block (higher than before), and set the type to Linux Raid Autodetect (fd).
I did the same on /dev/sdc, /dev/sdd, and /dev/sde, too.

Of course, since the array (/dev/md0) was still active and mounted, the kernel refused to re-read the partition table. That was fine: a reboot would solve that.


I rebooted, and to my dismay I found out that the array was no longer recognized! It turned out that my little fdisk adventure removed the RAID superblock, and I did so on ALL RAID6 members. That is slightly problematic: having even as little as one superblock still available is enough to use "mdadm --examine --scan" to get things up and running again, but I had NONE left.

You can imagine my sinking feeling as I realised that I might just have lost 1.2 TB of private data... What to do? All options of mdadm --assemble would not work, for lack of superblocks, and completely recreating the array would destroy all data, right?

Right?...

Wrong! It turns out that mdadm has a few nice cards up its sleeve... If you create an array with the bare minimum number of devices (N-2 for RAID6, N-1 for RAID6, 1 for RAID1), there is nothing to sync, and mdadm will not do so. Now, in RAID6 (and RAID5), the order of the devices is important (because of the data/parity-block rotation), so with my bare minimum of 2 devices, I made a list of all possibilities. If you call the two devices /dev/sd${X}1 and /dev/sd${Y}1, I had the following possibilities:

X Y
----
B C
B D
B E
C B
C D
C E
D B
D C
D E
E B
E C
E D

For each of these combinations, I ran:

# mdadm --create /dev/md0 --verbose /dev/sd${X}1 /dev/sd${Y}1 missing missing

(note the two missing devices at the end)
If that succeeded, I tried to run a filesystem check:

# fsck.ext4 /dev/md0

For most of the options, the block ordering would be wrong, so fsck.ext4 would not find a filesystem, so I would delete the array again using:

# mdadm --stop /dev/md0

I thus went through all the options, becoming more and more nervous, until the LAST option (seriously!) was right! :-) The filesystem was nicely checked, and then I could mount it, too:

# mount /dev/md0 /data

Of course I was running at the bare minimum of devices now, so I added the other members back in:

# mdadm /dev/md0 --add /dev/sdb1 /dev/sdc1

Linux does this one new device at a time, which takes 10 hours per device (it is 1.3 TB per device). I let it run overnight. When I looked in the morning, adding the first device had succeeded (putting me in the safety of N+1 redundancy already), and Linux was resynching the last device.


That was a close shave, in fact, _way_ too close for comfort! I'll be looking at a hardware RAID HBA next.