3

A 1TB disk containing one partition which was the only PV in a VG containing one LV was failing by becoming intolerably slow without reporting any errors. My first thought was to attempt to use LVM to mirror the data to another disk and continue uninterrupted, so I did this:

  1. Failed a 1TB disk out of an madam mirror set in the same machine
  2. Added that disk to the VG of the failing disk
  3. Changed the LV to have one mirror

I left this overnight to copy, after which it reported having copied 0.05% of the data. Considering this a failure, I bought two replacement 1TB disks with this in mind:

  1. Add the first new disk to the mdadm array (done, no problems)
  2. Restore a backup to a replacement LV on the good disk that was removed from the mdadm set (done, no problems, missing about a day of changes)
  3. Clone whatever's readable from the failing disk to the second new disk (done in a spare computer, ddrescue indicates complete success after some retries)
  4. Mount the clone and merge changes with the stale copy restored from backup (stuck here)
  5. Add the second new disk to the replacement VG and mirror the replacement LV

The problem is I can't figure out how to get LVM to let me access the clone of the failed disk.

LVM sees that there is a PV missing from the VG, and won't let me activate the VG, including with --partial, or make any changes, including lvconvert -m0 or lvconvert --repair, until the missing device is found. man vgreduce indicates that vgreduce --removemissing will remove the LV, not just the mirror on the missing PV. At least one source suggests that it could be made accessible by replacing the missing PV using pvcreate --uuid, which would require physically moving a good disk from the mdadm array to the spare machine (or buying another new one), and apparently requires a backup of the LVM metadata for the VG (which I don't have).

What's the most straightforward way to access the cloned LV?
Can it be done without physically adding another disk to the spare computer?


UPDATE

I did move another disk into the spare machine, and attempted to recreate the missing PV. LVM does not seem to support automated recreation, so I basically flailed at it until I somehow succeeded at making the LV available. The ext4 FS on it was mangled beyond recognition - e2fsck ran out of memory repeatedly trying to repair it.

I tried again, using ddrescue to copy from the failing disk to the same good disk (this time reporting a few errors), and this time LVM let me make the volume available without any additional disk (much as I expected in the first place). This time the FS was essentially intact, and I was able to recover everything I'd hoped to recover.

I have no idea why LVM refused to activate the VG on the first copy, and worked as expected on the second. With Google not returning any explanation, and no answers yet here, I'm concerned that recovery of LVM volumes is unreliable and it might be better to not use it.

2
  • I am not sure I fully understand the problem, but does not vgextend --restoremissing VG NEWPV allow you adding the new disk with restored data to the VG?
    – Martian
    Commented Nov 28, 2012 at 17:04
  • At the point where I was stuck, there was no restored data, only an inaccessible clone of the failing drive. --restoremissing is only for reconnecting a PV that LVM has failed to reconnect automatically, but the disk with the 0.05% mirror was repurposed (in step 2) to replace the failed drive. Commented Dec 2, 2012 at 18:42

0

You must log in to answer this question.

Browse other questions tagged .