4

I had a hardware raid array, a disk failed during migration and the controller flipped out and is unable to recover. I've written a small python script which has correctly mapped out the state of the array as it is currently (about 160GB was migrated the remaining 5.something TB wasn't) and I'm about to use that information to rebuild the file system onto a new disk. This process is likely to take a LOOOOONNNNGGGG time.

So my question is, if I recover say the first 250GB can I mount this partial file system and check that the script is doing the right thing (e.g. if I've got the block/stripe order incorrect the result will be garbage) before processing the whole array? If so what mount commands will I need to run to mount it.

As an aside, I assume I only need the file system and not the partition table?

Edit: Some more specific details on the failure: I had a 4 disk raid 5, I did an online capacity expansion to 5 disks, the new disk failed.

What I have done so far is to calculate the point at which the migration got to (using A xor B xor C == D, true for old array false for new, ignoring results from empty space)

Calculated the 'void space' between the arrays using old stripe size, new stripe size and the migrated size; double checked this result by comparing blocks at end of void space with end of migrated chunk till I found matches.

Now I'm writing some code to reconstruct the stripes, glue the two halves of the array back together and write it out to new disks, Ideally I'd like to be able to check the result of this code without having to write out the full 6TB array. By writing out the first 250GB or so I can check and refine the procedure for recovering the whole array without having to wait several days for the recovery to complete.

2
  • There exists a device-mapper "error" target which can be used to simulate a portion of a device that can't be read. You can probably concoct something that takes advantage of that. Be sure to mount the result read-only if you don't want the partial result tampered with.
    – Celada
    Commented Apr 11, 2015 at 17:35
  • What do you mean "migrated"? If it was raid0, then you only have every other block of data if one of two drives failed, which is useless. If it was just concatenated drives, then you may be able to recover some of it.
    – psusi
    Commented Apr 12, 2015 at 0:55

3 Answers 3

2

You can't mount half a partition. Half a partition isn't a container that contains half the files in the filesystem, it's basically unusable. Reaching a file requires traversing several directories and inodes which may be spread all over the partition.

You can however check the beginning of the partition to see if it seems to have a valid filesystem header. This only requires the beginning of the partition (the first few kilobytes) to be ok. file -s /dev/something will tell you whether that particular device seems to contain a partition (that's assuming that you already have a device entry for the RAID array that's being reassembled).

1

You should just try, but with read-only methods. If the filesystem superblock is still reachable, you should be able to mount it. Then exploring the filesystem will likely creates lots of IO errors and print botched file names and contents, but that's the best you could get in any case (apart from testdisk kind of tools which scans for specific file types).

I'd try :

  • tune2fs -l <blockdevice> : if that works, you should be able to mount it
  • `mount -o ro /mnt/recovery
0

Please describe your migration in more detail. Did you grow or shrink the array, add or remove disks, or what. You don't want the error target, instead use linear and snapshot so there is a virtual layer onto which you can do experiments such as fsck, debugfs and the like without having to create yet another copy for each experiment. Use a similar method as described in https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

Mounting is probably not a good way to "check if block/stripe order incorrect" etc. If the array was growing to an additional disk there should be an area where the new vs. old representation of data overlaps, so complete recovery might be possible.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .