2

Say I have a RAID 5 (Windows parity storage pool) and one drive fails. I will plug in a new one and start rebuilding.

There are many people saying that if one of the remaining drives has a read failure even of a single sector, the rebuild process might/will fail.

Is RAID 5 (I know there is no standard) data structure really intertwined in such a way, that all of the data is needed for successful rebuild or is it that only the files affected by the read error will not be rebuilt, which would make much more sense.

Also, what exactly does it mean rebuild fails? Those who preach the "one-error-rebuild-fail" mantra usually add, that failing to rebuild means losing all the data, but... since RAID 5 can be read even with one drive missing, once again, the data should be there with at most some files affected by the read error?

3 Answers 3

3
+25

Technically if you're rebuilding with (all drives - 1), and problems arise on one of the remaining drives, even it's only a single bad sector, you're exceeding what RAID 5 is designed to handle. And in practice one bad sector often means, more than one bad sector.

I have had RAID controllers drop a drive because of a single bad sector. I can not imagine those will allow a rebuild to complete if a bad sector on one of the remaining drives surfaces.

I suppose it may depend on policy of the actual software/controller whether it continues or not. A stricter policy could mean the rebuild is aborted.

"During rebuilding the RAID driver reads every block on all the surviving drives. If it encounters any bit errors, the rebuilding operation is typically aborted. The RAID is basically in limbo. It may stay in degraded mode or it may go into total failure mode. If the RAID consists of a large number of drives with large capacity, the probability of rebuilding failure can be very high. For example if the probability for a 2TB hard drive to have at least one bit error is 1%, then the probability of having at least one error in 12x2TB hard drives is 11%." - source

"In the situation you describe (a faulty disk + some unreadable sectors on another disk) some enterprise RAID controllers will nuke the entire array on the grounds that its integrity is compromised and so the only safe action is to restore from backup. Some other controllers (most notably from LSI) will instead puncture the array, marking some LBAs as unreadable but continuing with the rebuild. If the unreadable LBAs are on free space effectively no real data is lost, so this is the best scenario. If they affect already written data, some information (hopefully of little value) is inevitably lost." - source

For comparison, if we look at SSD drives, the more high end the SSD (enterprise grade) the less tolerant for any type of corruption the policy is at the firmware level. If it can not guarantee integrity of the data some (Intel) enterprise grade SSD's are programmed to 'brick' themselves. Philosophy is: either we deliver 100% intact data, if this can't be guaranteed then no data at all.

So, all I am trying to say is that there probably is no 'one size fits all' answer. It depends on the specific RAID controller (or software).

Bottom line: It depends.

Not all may be lost if the controller refuses a rebuild, if you clone/image the separate members, data recovery tools are often to virtually rebuild and array and allow you to recover the data.

1

Is RAID 5 (I know there is no standard) data structure really intertwined in such a way, that all of the data is needed for successful rebuild

No. The rotating parity information structure allows for the complete failure of one drive. Otherwise RAID5 would be as reliable as JBOD (just a bunch of drives).

or is it that only the files affected by the read error will not be rebuilt, which would make much more sense.

As a RAID5 structure allows for the complete failure of one drive, simple read errors do not matter.

What matters is if read errors occur with the remaining drives after loosing one drive. The rebuilt of the array requires everything to be read. This is a moment where other failures like unreadable sectors of other drives might emerge.

Such errors might cause your RAID5-Software to stop immediately or to ignore it leading to a partially reconstructed RAID5 which would result in data or metadata failure. The message "Rebuild fails" is not defined. It could mean that your software stopped the rebuilding process after being faced with the first unreadable sector. Otherwise it could have well terminated the process despite experiencing read errors providing with the maximum amount of data that could be recovered.

From a technical viewpoint, looking at the graph below, one unreadable sector in the list of data segments (A,B,C,D) will just affect that segment.

Data distribution over the drives of an RAID5 array,

Source: https://de.wikipedia.org/wiki/RAID#RAID_5:_Leistung_+_Parit%C3%A4t,_Block-Level_Striping_mit_verteilter_Parit%C3%A4tsinformation

5
  • Well, the question is how much data is lost when one drive fails completely and i get a single bad sector on another while rebuilding. Do I lose the whole segment? And how big those segments are? If one segment would be one file, then I would lose just one file. Commented Oct 27, 2021 at 7:50
  • You might include your question in your inital posting.
    – r2d3
    Commented Oct 27, 2021 at 14:35
  • I feel like I did. I am simply wondering how much data will I lose with one disk failed and single bad sector on one of the remaining drives. Commented Oct 27, 2021 at 16:28
  • Shoot a torpedo against the deathstar in Star Wars. What will happen? What are the odds? Break a node in the internet. What will happen? In addition to what will really happen with the disk your interface may act differently depending on your RAID5 solution provider.
    – r2d3
    Commented Oct 27, 2021 at 16:33
  • The bigger your drives and the more data you have, the bigger your chance of data loss with RAID5. It's simple math. Check drive specs for error rates. Example: 1 in 10^14 means you'll probably see unrecoverable error(s) around the time you have written ~10TB to that drive. Combine that with MTBF and 2 drives will statistically ~halve your file system's lifetime. 3 drives will statistically reduce it to just a bit above a third of the time before data loss...
    – svin83
    Commented Nov 11, 2022 at 11:14
1

The short answer is that it depends : on where the bad sector is located and on the behavior of the RAID software on Windows.

In your case of the Windows storage pool, this is a software RAID, so your RAID controller is Windows and its behavior in such a case is not well known.

You have the following factors in your favor:

  • RAID 5 can without any problem recreate the bad disk on a replacement disk
  • The bad sector might not be inside a file. If it is on free space then effectively no real data will be lost.
  • If the bad sector is inside a file, you have the parity on your side. As the parity data is distributed among all the disks, there is a good chance that this sector can be recreated from the data on the other 3 disks.
    The parity data for the sector can be on one of the other 4 disks. As one is dead, your chances of recovering the sector are 3 in 4, or 75%.
    In addition, as the parity data takes up 30% of the disk-space, for 4 disks we have a 120% coverage, which means that some parity data is duplicated on more than one disk.

All in all, you will need to try it to know what will happen. I suggest however taking a backup of your data before rebuilding the array.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .