1

I have a weird problem with my NAS, a Zyxel 540 with 4 × 2TB drives in RAID 5 configuration. Everything was working smoothly until I decided to replace one drive with a new WD Red, same size. As everyone expected, NAS would find out a new disk has inserted and starts to rebuild the RAID, meanwhile data are still safe... I already did this operation and worked, so... no problem!

In my dreams...

After I replaced the drive, NAS said the volume was inaccessible.

I panicked, so I mounted back the old drive...nothing happened, still problems...but data were accessible in the NAS manager, but not via LAN, and impossible to copy if not via terminal.

I tried (just to be sure) a partial recover on one drive with PhotoRec.  The data are still there, so the problem must be on headers or whatever.

I tried some commands by ssh to check status of the RAID, like:

mdadm --create etc
mdadm --examine /dev/md2

and moreover, find out drives order is gone, so I started all combinations like:

mdadm --create --assume-clean --level=5  --raid-devices=4 --metadata=1.2 --chunk=64K  --layout=left-symmetric /dev/md2 /dev/sdd3 /dev/sdb3 /dev/sdc3 /dev/sda3

I expected to make NAS work again with the correct combination, until I submitted this combination: C / A / D / B

Now I can't change combination, I'm stuck with this message:

mdadm --stop /dev/md2 it responds with this:

mdadm: Cannot get exclusive access to /dev/md2:Perhaps a running process, mounted filesystem or active volume group?

I also tried cat /proc/mounts, volume is not mounted :(

lsof | grep /dev/md2, nothing appears

AND

# e2fsck -n /dev/md2
e2fsck 1.42.12 (29-Aug-2014)
Warning!  /dev/md2 is in use.
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/md2

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

LVM configuration:

output from pvs:

PV       VG          Fmt  Attr PSize PFree 
/dev/md2 vg_ca74d470 lvm2 a--  5.45t 0

Output from vgs:

VG          #PV #LV #SN Attr   VSize VFree
vg_ca74d470   1   2   0 wz--n- 5.45t    0 

Output from lvs:

LV          VG            Attr       LSize   [snip]
lv_be37a89a  vg_ca74d470 -wi-a-----   5.45t
vg_info_area vg_ca74d470 -wi-a----- 100.00m

Software RAID configuration:

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md2 : active raid5 sda3[1] sdd3[3] sdb3[2]
      5848150464 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

md1 : active raid1 sdb2[4] sdd2[6] sdc2[5] sda2[7]
      1998784 blocks super 1.2 [4/4] [UUUU]

md0 : active raid1 sdb1[4] sdd1[6] sdc1[5] sda1[7]
      1997760 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>

I'm out of options, guys...  What should I do?

4
  • What does "cat /proc/mdstat" looks like? What does "vgs" and "lvs" report?
    – wazoox
    Commented Nov 7, 2019 at 16:46
  • Please put these in your question within code quotes, it's unreadable in the comments.
    – wazoox
    Commented Nov 7, 2019 at 16:59
  • I'm trying but is not working :( Commented Nov 7, 2019 at 17:02
  • Then post as an answer? I'll edit it back in the question later.
    – wazoox
    Commented Nov 7, 2019 at 17:05

3 Answers 3

0

So your setup is as follow:

  • 4 hard drives split in 3 partitions
  • 3 RAID arrays (2 RAID1, 1 RAID5) using the partitions from all 4 disks
  • 1 LVM lv using md2 (RAID5 array) as a PV

Before replacing the hard drive, you should generate mdadm.conf fromthe running configuration:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Also it's probably simpler to copy the partition structure from the existing drive:

sfdisk -d /dev/sda >> partitions

Then replace the disk, and apply the previous partitions to the new drive:

sfdisk /dev/sda << partitions

Last you'll have to re-insert each partition of the new disk in the 3 RAID arrays:

mdadm --manage /dev/md0 --add /dev/sda1
mdadm --manage /dev/md1 --add /dev/sda2
mdadm --manage /dev/md2 --add /dev/sda3

In case you may need to stop or start LVM from locking your arrays, use

vgchange -an # stop all LVs
vgchange -ay # start all LVs
5
  • <pre># mdadm --detail --scan &gt;&gt; /etc/mdadm/mdadm.conf -sh: can&apos;t create /etc/mdadm/mdadm.conf: nonexistent directory </pre> Commented Nov 7, 2019 at 17:49
  • Why not including sdd3? Commented Nov 7, 2019 at 17:54
  • I'm stuck here at first command, I can't edit /etc/mdadm/mdadm.conf because does not exist, maybe, also sfdisk is not present as command... Commented Nov 8, 2019 at 9:27
  • check if /etc/mdadm.conf exists instead. Else, create it anyway, it will be a backup of your configuration.
    – wazoox
    Commented Nov 8, 2019 at 14:19
  • Apparently the faulty drive is sda, not sdd.
    – wazoox
    Commented Nov 8, 2019 at 14:20
2

Good news!

I finally have my data back!

I tried to recover the superblock index with e2fsck using backups listed, but none of them worked :(

So I decided to come back to the old plan and try again logical devs combinations.

The procedure I followed is this one : 1) Deactivating the volume with vgchange -an 2) Stop md2 3) Create array with new combination

And when I arrived at C / B / D / A and rebooted, NAS is finally gives my data back.

I'm really happy now, thanks to everyone involved, for sure I'll have a look on this place more often now.

Good Luck and all the best!

0
# vgs
  VG          #PV #LV #SN Attr   VSize VFree
  vg_ca74d470   1   2   0 wz--n- 5.45t    0 
~ # lvs
  LV           VG          Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_be37a89a  vg_ca74d470 -wi-a-----   5.45t                                                    
  vg_info_area vg_ca74d470 -wi-a----- 100.00m                                                    
~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md2 : active raid5 sda3[1] sdd3[3] sdb3[2]
      5848150464 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

md1 : active raid1 sdb2[4] sdd2[6] sdc2[5] sda2[7]
      1998784 blocks super 1.2 [4/4] [UUUU]

md0 : active raid1 sdb1[4] sdd1[6] sdc1[5] sda1[7]
      1997760 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>
14
  • OK, so you have 3 RAID arrays, and an LV that probably lives on the md2 array. That's why you can't stop the array. BTW if you want to replace the drive, you need to partition it exactly the same as the old one, then reinsert each partition in each of the arrays. That's pretty complicated.
    – wazoox
    Commented Nov 7, 2019 at 17:21
  • It is a 4 drive raid, now is not taking a drive as a part of it Commented Nov 7, 2019 at 17:23
  • I forgot to ask you the output for "pvs". Also please provide output from "lsblk". We should see that your main volume is the lv lv_be37a89a, that's the one you want to fsck and mount, not the md device. Your storage is structured this way: disks > partitions > RAID array > logical volume > filesystem.
    – wazoox
    Commented Nov 7, 2019 at 17:23
  • It was always possible to stop it and try a new combination Commented Nov 7, 2019 at 17:24
  • # pvs PV VG Fmt Attr PSize PFree /dev/md2 vg_ca74d470 lvm2 a-- 5.45t 0 ~ # lsblk -sh: lsblk: not found Commented Nov 7, 2019 at 17:25

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .