1

Let's say a hardware RAID 1 controller with disks A and B was taken offline, and a file present on both disks was altered on disk B (all within an offline state). When the RAID controller is turned back on, and the user requests the altered file, what will happen? I am assuming that most RAID 1 controllers don't detect any errors until they attempt to read from that file.

  1. Will the RAID controller identify a difference in file size/date/signature and therefore report an error, or will it pass either of the 2 disks' contents for that file, unaware of the changes?

  2. Would the Operating System detect any errors?

  3. Would a software RAID 1 controller act any different?

  4. And finally, in any RAID 1 array of N disks, which disk(s) does the OS or controller actually read from? All N? Do some RAID controllers always use disk 0, will they randomly pick a disk, or do they have access to the file-system and check the integrity (even if the disk is encrypted)?

1
  • 1
    If you ever are to try that out please post it in this thread naming operating system and the RAID hardware/software used. Thank you!
    – r2d3
    Commented Jun 30, 2020 at 17:45

1 Answer 1

1

"Do they have access to the file-system and check the integrity (even if the disk is encrypted)"? No. The RAID controller does not know about files – it just pretends to be a disk controller, and it only deals with blocks (sectors). And because the individual sectors on both disks were correctly written, there are no errors to be detected at block level.

I don't know if there is any information about how hardware RAID controllers would handle the case of mismatching mirrors. However, NeilBrown – the long-time maintainer of Linux mdraid software RAID feature – has posted this LWN comment explaining how mdraid would handle it:

[...] If two devices in a RAID1 do not contain identical data, or if the sum of the data in a RAID4/5/6 doesn't match the parity block(s), then this is an inconsistency, not a corruption.
The most likely explanation is that multiple devices in the array were being written to when something went wrong (e.g. power loss) and some writes succeeded while others didn't. It doesn't matter which succeeded and which didn't.

In this case NEITHER BLOCK IS WRONG. I need to say that again. BOTH BLOCKS ARE CORRECT. They are just correct at different points in time.
There is NO CORRUPTION here, there is just an inconsistency.
Each block will either contain the new data or the old data, and both are correct in some sense.
(If a block got half-written to the device, which is possible if the device doesn't have a big enough capacitor, then you would get a read-error because the CRC wouldn't be correct. When you get a CRC error, md/raid knows the data is wrong - and cannot even read the data anyway).

In the case of RAID1 it REALLY DOESN'T MATTER which device is chosen to use and which device gets it's data replaced. md arbitrarily chooses the earliest in the list of devices.
In the case of a parity array it makes sense to use the data and ignore the parity because using the parity doesn't tell you which other device it is inconsistent with. (If you have reason to believe that the parity might not be consistent with the data, and one of the data block is missing - failed device - then you cannot use either date or parity, and you have a "write hole").


So would the operating system detect any errors? It depends on what filesystem is being used on the disks, and on whether the inconsistent sectors belonged to files or the filesystem's own metadata. If there was an inconsistency in a metadata sector, it is more likely to be detected – but in most cases it would be reported as generic filesystem corruption, not as RAID inconsistency.

It is less likely with files, as most filesystems do not checksum file contents at all – what they read from the disk is what you get. There are only few exceptions such as Btrfs or ZFS or ReFS which do checksum.

Some of those (Btrfs, ZFS) actually have their own disk mirroring feature which has an advantage over hardware RAID, in that the filesystem knows which disk has bad data and can automatically repair the file by reading from the other disk. With hardware RAID, however, they would have no way of asking the RAID controller for "the other version" and repair wouldn't be posible.


Would a software RAID 1 controller act any different? It might (see the quoted LWN comment above), but there is also another important difference.

With software RAID (as well as filesystem mirroring), the disks themselves carry information about belonging to a mirror set. So if you moved one disk elsewhere, it would still be recognized as part of an incomplete RAID 1 array, and normally the software wouldn't allow it to be written to in the first place – it would remain read-only unless you broke up the mirror.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .