mdadm RAID - Restart during Grow (Shrink) reshape - not working (stuck)

Question

I have a linux software Raid6 array (mdadm). I grew it from 6x4TB disks (16TB usable) to 7x4TB (20TB usable). The reshape went fine, but when I did resize2fs, I got the fairly well known issue of the EXT4 16TB filesystem limit. I checked and the filesystem does NOT have the 64 bit flag. So in an effort to reclaim the extra drive I had just added to the array, I did this:

johnny@debian:~$ sudo resize2fs /dev/md0 16000G
johnny@debian:~$ sudo mdadm --grow /dev/md0 --array-size=16000G
johnny@debian:~$ sudo mdadm --grow /dev/md0 --raid-devices=6 --backup-file=/tmp/backup

Notice the backup-file location. That's going to be important in a minute, because I'm on Debian.

So things were going fine, slow but working. The progress got to 3.7% and it had slowed to a crawl. I had assumed this was because I was reshaping a few other arrays during this same time. When those other jobs finished and this one didn't speed up, I got really worried. Since it said it would take years to finish, I decided I should restart and see if it would speed up, so I restarted the system.

This is when bad things start happening...

I'm on Debian, and it is my understanding that the /tmp folder is wiped out when the system comes up, so my backup-file from the reshape was lost. Also, because my /etc/fstab file was trying to mount md0, which wasn't assembling now, the system failed to come up a few times. I started from a live cd and fixed the fstab file and got the system to come back up.

Once I sorted that out, the system was up, and that was the first time I saw that md0 had not simply assembled itself and continued reshaping. Panic set in...

I don't have the output of the following commands, but I managed to find the commands I typed in. Brief explanation of what happened to follow...

johnny@debian:~$ sudo mdadm --assemble /dev/md0
johnny@debian:~$ sudo mdadm --assemble --force /dev/md0
johnny@debian:~$ sudo mdadm --assemble --force /dev/md0 --backup-file=/tmp/backup

The first command failed, so I tried the --force option, which also failed, but the error message told me the failure was because it needed the --backup-file option, so I ran the third command. I expected the backup file to still exist, but it didn't because it was in the /tmp folder and had been deleted. This didn't seem to cause any problems though, because the array assembled.

Here is what md0 looks like now. Notice the disk marked "removed". I suspect this is the disc that was being removed, sdj1.

johnny@debian:~$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sat Mar  5 20:45:56 2016
          State : clean, degraded, reshaping 
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

 Reshape Status : 3% complete
  Delta Devices : -1, (7->6)

           Name : BigRaid6
           UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
         Events : 4339739

    Number   Major   Minor   RaidDevice State
      11       8      224        0      active sync   /dev/sdo
       2       0        0        2      removed
       6       8       80        2      active sync   /dev/sdf
       7       8      176        3      active sync   /dev/sdl
      12       8       16        4      active sync   /dev/sdb
       8       8       32        5      active sync   /dev/sdc

       9       8      128        6      active sync   /dev/sdi

And here is the current progress of the reshape. Notice it's completely stuck at 0K/sec.

johnny@debian:~$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md0 : active raid6 sdo[11] sdi[9] sdc[8] sdb[12] sdl[7] sdf[6]
      15627548672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [U_UUUU]
      [>....................]  reshape =  3.7% (145572864/3906887168) finish=284022328345.0min speed=0K/sec
      bitmap: 5/30 pages [20KB], 65536KB chunk

unused devices: <none>

Here are the individual discs still in the array.

johnny@debian:~$ sudo mdadm --examine /dev/sd[oflbci]
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : 99b0fbcc:46d619bb:9ae96eaf:840e21a4

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : fca445bd - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : b8d49170:06614f82:ad9a38a4:e9e06da5

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : 5d867810 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : dd56062c:4b55bf16:6a468024:3ca6bfd0

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : 59045f87 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : 92831abe:86de117c:710c368e:8badcef3

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : dd2fe2d1 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : 8404647a:b1922fed:acf71f64:18dfd448

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : 358734b4 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdo:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 45747bdc:ba5a85fe:ead35e14:24c2c7b2
           Name : BigRaid6
  Creation Time : Fri Jan 11 09:59:42 2013
     Raid Level : raid6
   Raid Devices : 6

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 15627548672 (14903.59 GiB 16002.61 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=688 sectors
          State : clean
    Device UUID : d7e84765:86fb751a:466ab0de:c26afc43

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 15045257216 (14348.28 GiB 15406.34 GB)
  Delta Devices : -1 (7->6)

    Update Time : Sat Mar  5 20:45:56 2016
       Checksum : c3698023 - correct
         Events : 4339739

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.AAAAA ('A' == active, '.' == missing, 'R' == replacing

Here is /dev/sdj1, which used to be the only member of the array that was not a "whole disk" member. This was the one being removed from the array during the reshape. I suspect it is still needed to finish the reshape although it is not currently a member of the array because it has the data on it from before the reshape.

johnny@debian:~$ sudo mdadm --examine /dev/sdj1
mdadm: No md superblock detected on /dev/sdj1.

So here are my problems...
1. I can't get the reshape to finish.
2. I can't mount the array. When I try, I get this.

johnny@debian:~$ sudo mount /dev/md0 /media/BigRaid6
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
johnny@debian:~$ sudo dmesg | tail
[42446.268089] sd 15:0:0:0: [sdk]  
[42446.268091] Add. Sense: Unrecovered read error - auto reallocate failed
[42446.268092] sd 15:0:0:0: [sdk] CDB: 
[42446.268093] Read(10): 28 00 89 10 bb 00 00 04 00 00
[42446.268099] end_request: I/O error, dev sdk, sector 2299575040
[42446.268131] ata16: EH complete
[61123.788170] md: md1: data-check done.
[77423.597923] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)
[77839.250590] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)
[78525.085343] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)

I'm sure mounting would succeed if the reshape was finished, so that's probably what is most important. Just FYI, the data on this array is too big to be backed up, so if I lose it, the data is gone. Please help!

EDIT 1:

I could shell out $1000 (or more) and get enough disks to copy everything over, but I'd need to be able to mount the array for that to work.

Also, I just noticed that the "bad geometry" error message I get when trying to mount the array has some interested info in it.

[146181.331566] EXT4-fs (md0): bad geometry: block count 4194304000 exceeds size of device (3906887168 blocks)

The size of device, 3906887168, is exactly 1/4 of the array size of md0, 15627548672. From "mdadm --detail /dev/md0"

Array Size : 15627548672 (14903.59 GiB 16002.61 GB)

I don't know where the 4194304000 number is coming from... but doesn't that mean the array is the right size to fit on these disks? Or do these sizes not account for the mdadm metadata? Is 4194304000 including the metadata perhaps?

I could swear I tried a few times to get the sizes right before the reshape would even start, so I thought everything was good to go. Maybe I was wrong.

There is no such thing as "too big to backup". If you don't back up your data, it is only a matter of time before you lose it. RAID is not a substitute for backups. — psusi, Commented Mar 7, 2016 at 2:19
yes, backups are essential. for this amount of data, that means either tape (with an automated tape library or loader so you don't have to swap tapes every few hours) or a second server with at least as much storage capacity. Also, I'd recommend using ZFS (on both servers) rather than RAID-5/6, so you can use zfs send for backups rather than rsync....note, though, that while you can expand a ZFS pool there are restrictions on exactly how that can be done (e.g. you can't just add an extra disk to an existing RAIDZ vdev). — cas, Commented Mar 7, 2016 at 3:59
Finally, if you can afford it and have the drive bays and SAS or SATA ports available, I'd recommend using RAID-10 with mdadm (or multiple mirrored pairs of disks in ZFS). Storage capacity is n/2 rather than n-1 or n-2 but performance is far greater and you can easily expand just by adding another pair of disks (of any size). And you can also replace existing disks with larger ones two at a time rather than the entire array. — cas, Commented Mar 7, 2016 at 4:01

frostschutz · Accepted Answer · 2016-03-08 01:24:12Z

Your error is already in the first command, 16000GiB is simply not what you have on 6x4TB disks, even 16000GB might be a stretch since you lose some space to mdadm metadata and such. The very last error message quite resembles that situation (bad geometry, filesystem believes to be larger what device offers, and filesystems absolutely hate that).

So you are looking at a multitude of problems right now:

your filesystem is too large
your shrink is stuck halfway
at least one of your disks failed (/dev/sdk)
your backup-file was lost
possible filesystem inconsistencies (obviously)

The solution to your problem is not making the shrink somehow finish, but rather, reversing to the previous state without damage... this might still be possible to achieve since thankfully the shrink did not progress very far yet.

In this situation I would

stop the RAID and make sure nothing else starts it either (disable udev auto-assembly rules and such things)
make the harddisks read-only using an overlay file (requires an additional spare disk for temporary storage)
attempt to re-create the RAID using said overlays

Re-creating RAID is a really, really bad idea in general since it almost always goes wrong. However I think in your case it might cause less damage than trying to reverse the shrink process in any other way. In the 7 disk RAID 6, the 16GiB area might still be untouched... assuming the filesystem was idle while the RAID was shrinking. Otherwise you're looking at even more filesystem inconsistencies.

When re-creating RAID you have to get all variables right: metadata version, data offset, raid level, raid layout, chunk size, disk order, ... and you have to prevent re-syncs by leaving redundancy disks out as missing.

It might be as such:

mdadm --create /dev/md0 --assume-clean \
    --metadata=1.2 --data-offset=128M --level=6 --layout=ls --chunk=512 \
    --raid-devices=7 missing missing /dev/mapper/overlay-{sdf,sdl,sdb,sdc,sdi}

No warranty, I obviously haven't tried this myself.

Once you've verified that it works, you can commit the changes to disk (apply the steps that are verified to work w/o overlay). Then this time around, resize2fs the filesystem correctly (14G should work), then shrink the RAID and hope it won't get stuck again. A backup file might not be needed for mdadm --grow, if you do use one make sure not to lose it.

Only raid4 uses a dedicated redundancy disk. You can't remove the parity by leaving out a disk in raid5/6 and doing so also leaves out a lot of the original data. The --assume-clean is sufficient to prevent a resync. — psusi, Commented Mar 7, 2016 at 2:18
@psusi I did not mean parity disk. In RAID6 you can leave out any 2 disks. In this case you have to leave at least one of them out (the defective one), I like to leave two out so there is only one possible interpretation of the data. Basically you want to check md metadata and smart for all disks and leave out the two worst ones / only keep the best ones, at least for the first try... --assume-clean is enough for initial assembly but missing is additional insurance in case you have a mdadm check / repair cron job firing up and such things — frostschutz, Commented Mar 7, 2016 at 9:58
It sounds like I need to reverse the shrink process. Is there a way to cancel mid way through the job and put it back the way it was? Or should I just shrink the filesystem more so it'll fit? Would that let it finish successfully? — John, Commented Mar 7, 2016 at 15:25
@frostschutz, I was able to recover the old array with a create command nearly identical to yours! It helped a great deal! Thank you so much! The only thing I did different was I only removed (missing) the one failed disk. I left the rest of your command. Now I'm at the stage where I have 6 of 7 disks assembled and mounted. What is safe to do now? Check array? Wipe and re-add the other disk? I just don't want to re-break it now that it's working. — John, Commented Mar 7, 2016 at 23:48
Added a paragraph to my answer. You really should make backups of your stuff... — frostschutz, Commented Mar 8, 2016 at 1:25

Stack Exchange Network

mdadm RAID - Restart during Grow (Shrink) reshape - not working (stuck)

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
debian
raid
mdadm
software-raid
.

Hot Network Questions

mdadm RAID - Restart during Grow (Shrink) reshape - not working (stuck)

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged debianraidmdadmsoftware-raid.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
debian
raid
mdadm
software-raid
.