In general, a newly created array to enable device redundancy on zeroed disks would not need any prior syncing as long as the checksum (or copy, for RAID1) of those zeroed input blocks is also zero. There is no functional difference how a block is zeroed: Prior to RAID creation, or through the process of the RAID sync. So, indeed, --assume-clean
is what can be safely used to skip the time consuming (and in case of SSDs wear inducing and thus undesired) random (re)write of blocks from zero to zero.
To my understanding, the mdadm write-intent-bitmap is a device-local (not array-local) indicator about the consistency of individual devices with each other. I'm not sure if the bitmap itself is used as an indicator of inconsistency on the array level, aka If all bitmap bytes are zero, the array can be assumed in sync; if not, checksums must be rewritten/data copied for RAID1.
Within the constraints of the assumptions outlined above, the most safe approach to create an array without needing a prior sync for full redundancy seems to me to create it on guaranteed zeroed disks with --assume-clean --bitmap=none
, and — if desired — add a bitmap in a second step. This provides consistency without sync in any case, also is safe in degraded mode, and also gives a clean result with a checkarray run. Again, this is true only for RAID levels where the calculated checksum of zeroes is also zero, or for RAID1 where a copy of a zero yields also a zero.
Here comes some speculation. I don't know enough about the inner workings of mdadm to know for sure what happens if non-zeroed disks are used with --assume-clean --bitmap=none
. So take the following statements with caution.
Assuming checksum calculation for reads is done in degraded mode only (very likely, for performance reasons), it's even safe to not zero disks before bundling them in an array: Checksums of blocks will be corrected "lazy", after each write to the array. Data blocks not having been written to the array (and thus with a not matching checksum) can be considered unimportant: From a file system's PoV, they're "free space". And because reads to unallocated blocks do not trigger a checksum fault, there should be no functional difference from reading unallocated blocks from a single disk for whatever reason.
This is likewise for a RAID1: Already written data is consistent on all mirrored members. Never written data giving inconsistent reads doesn't matter.
If a partially written array is used in degraded mode, already written data has correct checksums/copies and thus can be recreated correctly. All free blocks still don't matter. If mdadm returns garbage from recalculating the checksum from never written blocks, it's just different garbage, but still irrelevant because not in use by the file system.
In short: The filesystem keeps track of allocated blocks. Since these blocks are written to the array before the need to be re-read eventually, data is consistent.
Regarding checkarray, it cannot know which blocks have ever been written, so it will need to correct all not yet written blocks, be it checksum-based, or just a copy as with RAID1. Unless the write-intent bitmap plays a more important role than I anticipate, that is.
What I did not yet mention is the problem of software bugs, corrupted file systems through power outages, and faulty disk sectors. Possible scenarios and effective mitigations (such as the data=ordered
mount option for ext4) are left as an exercise to the reader.