With the space to cost ratio of hard drives leading to capacities that present an increasing challenge to parity Raid systems (URE probability, multi-day rebuilds stressing drives and risking secondary failure) is there a case to be made that they also present an opportunity/recommendation for multi-disk (>2) Raid 1 configurations for use at home and in small businesses? That is, Raid1 'wastes' space compared to a parity scheme, but with cheap mutli-terabyte drives, is space not arguably the commodity we have to waste compared to the memory, bandwidth, cpu clock, downtime etc required in striping, rebuilds etc.
If I understand the technology correctly, a Raid 1 (>2) configuration:
- ..cannot fail a rebuild, in so much as long as one drive is accessible so are the files.
- ..provides greater security relative to the drive count. It's basically an inversion of the ratio right? Raid5 on three disks provides 66% usable capacity and can handle 33% (1 drive) failure. Raid1 provides only 33% capacity but 66% failure (2 drives).
- ..has an easier and more robust recovery mechanism. In dire conditions, the Raid need not be rebuilt to access the data. Drives can be migrated between different systems.
That last point feels akin to the unRaid philosophy where files are saved across drives, and not striped, and the second point suggests to me something more attractive to a small business; prioritizing data security over data capacity. What's the point of being able to store more data if it's at greater risk?
Questions:
Would a strategy of 'manually-mirroring' the content, as opposed to Raid1 copying-at-time-of-writing, avoid potential problems with non-ECC memory corruption? Since the job of writing to each disk would be a distinct process the same memory fault that occurs in one write would not theoretically occur in another ?
Is there a Raid1 software/hardware designed to work with more than two drives, and can therefore handle traditionally un-raid1-like behaviours such as bit-rot, URE etc? With a traditional disk mirror setup if disk A reports a 0 and B reports a 1, you have no idea who flipped, but in a three or more disk solution could you not 'correct' the odd man out disk? A=0, B=1, C=0 thus B must have flipped?
At the same time, would such a Raid1 solution support parallel reads from more than two disks to provide accelerated read speeds? This is especially important since I imagine there are many businesses who read the same data many more times than they edit it.
Raid is not a backup solution. So if you were to follow good practice and have between 2-3 storage locations, is it intended that only one of them is protected from drive failure via a raid scheme or would you expect a second offsite location to also have multidisk redundancy? Is it the case that even a single drive, stored in a different location is thought to be protected from device failure because it's in an 'effective' mirror relationship with the drives in the other locations? If not, then does Raid1 not offer some redudancy for the lowest minimum drive count (2)?
Update:
Re point 1, as some have pointed out, this does rely on a data source free of errors. This is however the case for pretty much any storage/backup strategy. We may use sophisticated systems for data integrity in our NAS/SAN solutions but the data we store in them is typically being generated by work stations and devices without such measures. Most production PC's are built for either cost or speed. Highly unlikely, especially in a small business, that the computer you do your CAD or finances or powerpoint on etc uses ZFS formatted drives and ECC memory.
In my specific example, one of the things I will be looking to store will be the output from a camera. I have to assume the photos and videos it saves on to the SD card are 'correct', and there's not much I can do if the corruption occurs at the point in the data chain.
Suggestion:
If this strategy is not supported at the RAID level are there any software packages that can be used to manually perform some of the activities I'm talking about? Writing copies via queued, discrete rysnc tasks? A bash script could perform a periodic bit-rot scrub? Just checksum all copies of a particular file across all disks, then overwrite the copy on a disk with the wrong checksum using the copy on a correct disk?