That's because when there is silent data corruption, md has no mechanismdoes not have enough information to identify aknow which block is silently corrupted data block.
You can technically make a bad sector with hdparm --make-bad-sector
, but how do you know which disk has the data block affected by silent data corruption? It's not practical
Consider this simplified example:
Parity formula: PARITY = DATA_1 + DATA_2
+--------+--------+--------+
| DATA_1 | DATA_2 | PARITY |
+--------+--------+--------+
| 1 | 1 | 2 | # OK
+--------+--------+--------+
Now let's corrupt each of the blocks silently with a value of 3
:
+--------+--------+--------+
| DATA_1 | DATA_2 | PARITY |
+--------+--------+--------+
| 3 | 1 | 2 | # Integrity failed – Expected: PARITY = 4
| 1 | 3 | 2 | # Integrity failed – Expected: PARITY = 4
| 1 | 1 | 3 | # Integrity failed – Expected: PARITY = 2
+--------+--------+--------+
If you didn't have the first table to calculatelook at, how would you know which block was corrupted?
You can't know for sure.
This is why Btrfs and ZFS both checksum blocks. It takes a little more disk space, but this extra information lets the storage system figure out which block is lying.
From Jeff Bonwick's blog article "RAID-Z":
Whenever you read a RAID-Z block, ZFS compares it against its checksum. If the data disks didn't return the right answer, ZFS reads the parity and then does combinatorial reconstruction to figure out which disk returned bad data.
To do this with Btrfs on md, you would still be out of luck if a parityhave to try recalculating each block were silently corrupteduntil the checksum matches in Btrfs, a time-consuming process with no easy interface exposed to the user/script.
- When ZFS detects silent data corruption, it is automatically and immediately corrected on the spot without any human intervention.
- If you need to rebuild an entire disk, ZFS will only "resilver" the actual data instead of needlessly running across the whole block device.
- ZFS is an all-in-one solution to logical volumes and file systems, which makes it less complex to manage than Btrfs on top of md.
- RAID-Z and RAID-Z2 are reliable and stable, unlike
- Btrfs on md RAID-5/RAID-6, which only offers error detection on silently corrupted data blocks (plus silently corrupted parity blocks may go undetected until it's too late) and no easy way to do error correction, and
- Btrfs RAID-565/RAID-6, which "has multiple serious data-loss bugs in it".
- If I silently corrupted an entire disk with ZFS RAID-Z2, I would lose no data at all whereas on md RAID-6, I actually lost 455,681 inodes.