1

I have (had?) a RAID5 with 3 devices. One of them died, and after some days I experienced that the RAID stopped at all. However, I could restart it without experiencing any problems, but it stopped again after a few hours, I restarted it and it stopped again after some moments and so on. Since one month, the RAID can't start anymore. (Within the last month, I didn't do anything with the RAID as I hadn't time for it.)

I don't know if this is a hardware failure (of the drives) or "just" a defective contact of the power cable, since I had problems with this one year ago. I'm currently hoping for "just" a defective contact. The RAID holds mainly data of which I have a backup, however, the backup doesn't hold changes made within a month or so.

I found this blog post about recovering from a RAID5 with two failed disks. It describes a similar problem that I (hope to) have: The drives (or, at least one of the two failed ones) isn't really defective but only has been detached from the computer. Their approach is to re-create a RAID5 using all but the first failed device.

In my case, I have three disks, one of them is dead. So I have only two left: /dev/sda1 and /dev/sdc1, while the latter is the one which has been "detached" (at least, I hope it is not dead). So I hope to get the most important information from examining this device:

sudo mdadm --examine /dev/sdc1

          Magic : a92b4efc
        Version : 0.90.00
           UUID : 83cb326b:8da61825:203b04db:473acb55 (local to host sebastian)
  Creation Time : Wed Jul 28 03:52:54 2010
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 1465143808 (1397.27 GiB 1500.31 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 127

    Update Time : Tue Oct 23 19:19:10 2012
          State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : eaa3f133 - correct
         Events : 523908

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       0        0        2      faulty removed

So it was October 23rd when the RAID stopped working at all.

Now I want to recover using the two devices with the command

sudo mdadm --verbose --create /dev/md127 --chunk=64 --level=5 --raid-devices=3 /dev/sda1 /dev/sdc1 missing

I hope someone can tell me if this is the correct command to use. I'm very nervous... It is telling me to confirm the following data about the drives to use to re-create the array:

mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda1 appears to contain an ext2fs file system
    size=1465143808K  mtime=Tue Oct 23 14:39:49 2012
mdadm: /dev/sda1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Wed Jul 28 03:52:54 2010
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Wed Jul 28 03:52:54 2010
mdadm: size set to 732570816K
Continue creating array? 

Additional info: I once created the array with 3 * 750GB drives, so the file system is 1.5TB (ext2). In particular, I wonder if the line telling that /dev/sda1 contains a 1.5TB ext2 file system is correct, because in the blog post linked above, their output doesn't show such a line...

I also wonder if I should zero the superblock on any device first...

Are there any checks I can do to confirm that this will most probably not kill something totally for which there exists a chance to recover from?

5
  • Ah and please don't tell me to better backup daily in the future. I've learned my lesson, believe me :/
    – leemes
    Commented Nov 21, 2012 at 22:06
  • That is a last resort command you want to avoid if possible. What does mdadm -E /dev/sda1 show? You should be able to have it start with mdadm --run /dev/md0
    – psusi
    Commented Nov 22, 2012 at 1:01
  • I tried this, but... mdadm: failed to run array /dev/md127: Input/output error Examining sda1 shows a similar output than above, but with drive number 1 also saying "faulty removed".
    – leemes
    Commented Nov 22, 2012 at 19:21
  • @psusi However, I just noticed that today the creation command shows a different output: mdadm: super1.x cannot open /dev/sda1: Device or resource busy mdadm: failed container membership check mdadm: device /dev/sda1 not suitable for any style of array I think this has something to do with dmraid. I experienced "busy" errors before and found out that I should add nodmraid to the boot options. This helped yesterday, howver, today it doesn't. I'm confused at this point.
    – leemes
    Commented Nov 22, 2012 at 19:27
  • Now I just re-added sdc1. I tried this yesterday without success, however, today it worked and I could then --run the RAID successfully. Here is some command outputs: pastebin.com/pGaAV5rf -- I hope it keeps running while making some more backup of the most important stuff.
    – leemes
    Commented Nov 22, 2012 at 19:38

0

You must log in to answer this question.

Browse other questions tagged .