Long story short, for my first thread here, I have a software RAID5 array set up as follow: 4 disk devices with a linux-RAID partition on each. Those disks are: /dev/sda1 /dev/sdb1 /dev/sdd1 /dev/sde1
/dev/md0 is the raid5 device with a ciphered LVM on it. I use cryptsetup to open the device, then vgscan and lvcan -a to map my volumes.
Yesterday, I found out that /dev/sdd1 was failing. Here are the steps I followed:
0. remove the failing disk
# mdadm --remove /dev/md0 /dev/sdd1
1. perform a check on the faulty drive
mdadm --examine /dev/sdd1
I got the "could not read metadata" error.
2. tried to read the partition table
I used parted and discovered that my Linux-RAID partition was gone, and when I tried to re-create it (hoping to be able to re-add the drive) I got the "your device is not writable"
So, it's been clear: that hard drive is dead.
3. Extract the hard drive from my case (bad things follow)
So I tried to extract /dev/sdd1 from my case not knowing which of the 4 drives it was. So I unplugged one SATA cable to find out that I had just unplugged /dev/sde1 ; I replugged it and unplugged the following one, nice catch! it was /dev/sdd1
4. what have I done?! sad face
using :
# mdadm --detail /dev/md0
I realized that /dev/sde1 left the array marked as "removed". I tried to re-add it, not using --re-add, but :
mdadm --add /dev/md0 /dev/sde1
/proc/mdstat showed me the rebuilding process and mdadm --detail /dev/md0 displayed /dev/sde1 as "spare" ; I know I might have done something terrible here.
I tried to remove /dev/sde1 from the array and use --re-add but mdadm told me he couldn't do it and advise me to stop and reassemble the array
5. Where to go from here?
First thing first, I am waiting for a new hard drive to replace the faulty one. Once I will have it and will set it up as a new Linux-RAID partition device known as /dev/sdd1, I will have to stop the array (LVM volumes are not mounted no more, obviously, cryptsetup closed the ciphered device, yet mdadm has not been able to stop the array yet). I was thinking about rebooting the entire system and work from a clean start. Here is what I figued I should do:
# mdadm --stop /dev/md0
# mdadm --stop /dev/md0
# mdadm --examine /dev/sd*1
# mdadm --assemble --scan --run --verbose
I read that without --run option, mdadm wll refuse to scan the degraded array.
Best case scenario: /dev/sde1 is recognized by the re-assembling process and new /dev/sdd1 is used to repair the previous faulty one. I would not have lost any data and will be happy.
Worst, and most common, case scenario: Re-assembling the array fails to recover /dev/sde1 and I have to start from a blank new array.
Am I missing something here? What should I review from this procedure?
Best Regards from France
--stop
followed by--assemble --force
would have sorted it in as much that is possible after yanking the wrong drive.mdadm --detail /dev/md0
so I am guessing from your reply that I screwed my data