8

During planning my RAID setup on a Synology Disk Station I've done a lot of reading about various RAID types, being this a great reading: RAID levels and the importance of URE (Unrecoverable Read Error).

However, one thing remains unclear to me:

Let's have two scenarios:

  1. An array is a RAID 1 of 2 drives
  2. An array is a RAID 5 of 3 drives

The same assumptions for both scenarios:

  • Let's have 100.000 files on the RAID array
  • One drive fails (needs replacement)
  • There happens to be one bad sector (URE) during rebuilding the array

What happens? Does the RAID rebuild with 99.999 files doing fine and 1 file lost? Or am I going to lose all 100.000 files?

If the answer requires the knowledge of the filesystem type, let assume it's BTRFS or ZFS being the filesystem.

7
  • the logical answer is: it depends. Raid 1 is a direct copy of another drive. Raid 5 requires at least 3 drives to work, where Raid 1 only needs 2 but with the fact that you are losing capacity. And it depends on what's the error is. In the case of ZFS, it may be a better chance of getting a correct file again. However, the raid will never be the solution for not taking any backups.
    – djdomi
    Commented Jun 27, 2021 at 14:45
  • You may want to distinguish these failure modes: 1. a sector is unreadable and unwriteable; 2. a sector is unreadable, but it can be overwritten, and then it is readable again.
    – pts
    Commented Jun 28, 2021 at 9:00
  • 1
    What happens? Does the RAID rebuild with 99.999 files doing fine and 1 file lost? Or am I going to lose all 100.000 files? Either one might happen. That's why you have backups. RAID is not a backup! Just because your files are on a RAID array doesn't make them safe. If someone runs rm -f -r /all/my/important/files, they're gone - from every disk in the RAID array. The only thing RAID does is improve the availability of your data. Commented Jun 28, 2021 at 10:34
  • @AndrewHenle Can you please elaborate on the 'Either one might happen' part? Thanks Commented Jun 28, 2021 at 13:41
  • You're assuming the read error occurs only in file data. It can happen in filesystem metadata, too. Depending on your filesystem, it's possible that can cause loss of everything stored in the filesystem. Never rely on RAID for data security. All it does is protect your ability to access your data against a few types of disk failure. Commented Jun 28, 2021 at 15:41

1 Answer 1

13

The short answer is that it depends.

In the situation you describe (a faulty disk + some unreadable sectors on another disk) some enterprise RAID controllers will nuke the entire array on the grounds that its integrity is compromised and so the only safe action is to restore from backup.

Some other controllers (most notably from LSI) will instead puncture the array, marking some LBAs as unreadable but continuing with the rebuild. If the unreadable LBAs are on free space effectively no real data is lost, so this is the best scenario. If they affect already written data, some information (hopefully of little value) is inevitably lost.

Linux MDADM is very versatile, with the latest versions having a dedicated "remap area" for such a punctured array. Moreover one can always use dd or ddrescue to first copy the drive with unreadable sectors to a new disk and the use that disk to re-assemble the array (with some data loss of course).

BTRFS and ZFS, by the virtue of being more integrated with the block allocation layer, can detect if lost data are on empty or allocated space, with detailed reporting of the affected files.

22
  • 1
    I once had to get a crucial file back from a six-disc RAID-0 array, with two failed drives, under Solstice Disk Suite. I found that ufsdump would still read the data, but stop each time it got to a block it couldn't read, and ask if it should continue. yes | ufsdump gave me a datastream I could pipe into ufsrestore, and since my crucial file was much smaller than the RAID stripe size I figured I had a ~5/6 chance of getting my file back. Which I did, leading to great rejoicing among the developers - ah, good times!
    – MadHatter
    Commented Jun 28, 2021 at 11:35
  • 2
    The problem with punctures is that you have no easy way of knowing whether important data has been damaged or not without doing a full integrity check, and you can't do that unless you have a good backup to verify it against, and if you've got a good backup to verify it against, then there's no good reason to do an integrity check when you could just restore from the backup and be done with it. Saves a lot of checking. That's why the enterprise controllers consider the whole thing toast for a puncture.
    – J...
    Commented Jun 28, 2021 at 12:19
  • @shodanshok Would it make sense to implement a RAID with 2 redundant copies (3 drives containing exactly the same data) so that if one drive dies and the other two have a few bad sectors it's statistically almost impossible that those bad sectors would overlap therefore the reliability of such setup would be 99.99999%+ ? Commented Jun 28, 2021 at 14:12
  • 1
    Yes, it's called RAID 6. Commented Jun 28, 2021 at 16:19
  • 1
    @adamsfamily what you describe is 3-way RAID1. It is perfectly doable both with Linux MDRAID and ZFS, but not all hardware controller supports it due to the very high space penalty (only 33% of space is user-available). If dealing with parity RAID, you need RAID6 for double-redundancy.
    – shodanshok
    Commented Jun 28, 2021 at 16:32

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .