Timeline for RAID0 instead of RAID1 or 5, is this crazy?

Current License: CC BY-SA 4.0

11 events

when toggle format	what		by	license	comment
Sep 19, 2019 at 3:06	comment	added	Greg		For scenario #1, replication would still fall behind, but if your primary looses a single drive while your secondary falls behind, you won't lose any data.
Sep 19, 2019 at 3:05	comment	added	Greg		@zsqlman - having fewer disks does not reduce your risk of failure, just the risk of 2 disks failing. If there is a 10% of any disk failing (just making up a number) then without redundancy there is a 10% chance that your RAID goes offline, regardless of how many disks there are. Having fewer disks does not reduce the risk of 1 failure.
Sep 5, 2019 at 19:09	comment	added	zsqlman		As for scenario #1, replication would fall behind regardless of the RAID level. RAID0 actually writes faster, so reduces the chance of this though the network is more likely to be the bottleneck.
Sep 5, 2019 at 19:03	comment	added	zsqlman		Regarding your edge case #2, there are 2 secondaries which should mitigate that issue as long as I don't patch both simultaneously.
Sep 5, 2019 at 18:50	comment	added	zsqlman		You are correct that each disk has the same odds of failure. Fewer disks mean fewer chances of failure.
Aug 30, 2019 at 20:26	comment	added	Greg		@zsqlman - I've added an extra time of when you might lose data because you don't have RAID. Also, the logic you apply to reduced failure I think is still flawed. The odds of one disk failing with fewer disks in the RAID is the same as 1 disk failing with redundancy in the RAID. Reducing the number of disks doesn't reduce the risk of any one disk failing - each disk is just as likely to fail as any other disk.
Aug 30, 2019 at 20:24	history	edited	Greg	CC BY-SA 4.0	Adding extra example of why RAID0 might be a bad idea
Aug 30, 2019 at 14:38	comment	added	zsqlman		@Greg Good follow up questions and some I had not fully fleshed out. There are numerous layers of redundancy with the servers being triple. Restoring all the databases can be easily scripted. If a node fails, we would kick that replica from the AG removing the Tlog backlog issue and even if we don't remove the node, we have plenty of space to contain a few days worth of log growth. Regarding recovery time, I only have one data point and don't have more spare hardware to test. We've only had 1 RAID failure and it took 2+ days to recover and we can do the restores in 8ish hours.
Aug 30, 2019 at 14:24	comment	added	zsqlman		@Greg The fact that I might not have thought everything through is why I'm asking this question. I guess I would say I'm seeing where I can improve efficiency as a whole. To answer your questions: 1. Yes. The failure of the array will immediately cause the AG to fail to a different node. A bad sector depend on whether it was a recoverable bit error or not, but this would cause a failure whether the disk was in any kind of RAID or not. 2. Fewer disks would decrease the chance of failure IN the array. RAID0 would increase the chance of failure OF the array. 3. No, money savings is perk.
Aug 30, 2019 at 8:01	comment	added	user		Adding to your point on #3, if the cost of an extra disk (or three) is what makes or breaks the budget, then from where will the money come to replace it when one disk fails?
Aug 29, 2019 at 23:03	history	answered	Greg	CC BY-SA 4.0