Why don't RAID systems protect against motherboard crashes?

Question

There are a few questions around here that make it clear there's no simple way to recover data from a RAID array after a motherboard crash.

The answer to this particular question suggests using an add-on PCI card (that can be moved with all disks to a new system without losing data). However, that just moves the problem (what if the PCI card gets fried?). Then there are NAS systems, but then again, what if the NAS motherboard gets fried?

[The "one and only" statement following this edit is incorrect, as Peregrino69 has pointed in his answer below.]

Keeping in mind that the one and only reason for RAID systems to exist at all is to preserve user data in the event of hardware failure (not counting RAID-0 here), I'd expect RAID technology to have solved this glaringly obvious problem long ago.

I find it plain ridiculous that, every time someone asks what to do with a RAID system and a fried motherboard, answers seem to go "hey, I once managed to recover from a similar situation using this one weird hackish trick - it might work for you". Also ridiculous is that data on a single non-backed-up SATA disk connected to the cheapest PC motherboard would be trivially recoverable after a motherboard crash, while data on an expensive RAID-5 NAS system would be mostly lost forever in the same situation.

Why isn't there a standard solution to this problem, designed at least 20 years ago, and implemented since then by all RAID systems worth the name?

What exactly do you consider to be a "motherboard crash", if something happens to the motherboard to any device, you have bigger problems then restoring the data on the RAID it was handling. I think the general thought is that RAID cards are a great deal cheaper then motherboards, so if something happens to the RAID card, it can easily be replaced. The expense of the motherboard is the reason RAID functionality built into motherboards should be avoided. — Ramhound, Commented Sep 23, 2021 at 17:35
@Ramhound if something happens to the motherboard to any device, you have bigger problems then restoring the data on the RAID it was handling I don't agree with this. It is entirely possible that the cost of the motherboard is far less than the cost of losing your data. In fact, it would not make sense to spend more money in protecting your data than the value of the data itself. — Jojonete, Commented Sep 23, 2021 at 17:41
If the data is expensive enough that you cannot replace hardware then you should certainly put in a backup solution (which RAID is not). — doneal24, Commented Sep 23, 2021 at 17:45
Yes. I've heard of this and had colleagues explain to me before where you could not move the raid array to another system because the motherboard wasn't an exact match due to age and they lost all the data on the array so they had to start over and restore all the data from backup. Yeah, I was surprised too! — hookenz, Commented Sep 24, 2021 at 2:59

Peregrino69 · Accepted Answer · 2021-10-05 13:05:45Z

35

the one and only reason for RAID systems to exist at all is to preserve user data in the event of hardware failure

RAID was developed to ensure of the availability of data in the event of a specific hardware failure, namely a disk failure. The in-question-excluded RAID 0 also can be used to extend a volume size over the capability of a single physical drive.

The data preservation tools are backup and long-term archiving.

edited Oct 5, 2021 at 13:05

answered Sep 23, 2021 at 17:27

Peregrino69

4,7443 gold badges23 silver badges30 bronze badges

2

@doneal24 Yes indeed, but remember that the OP stated "not counting RAID-0" already at the get-go :-) And the sentence I quoted above still remains incorrect. By my count that's 2 invalid premises :-D
– Peregrino69
Commented Sep 23, 2021 at 17:51
1

You can create very large volumes using other RAID levels. I currently have several raid6 volumes with 14 10TB drives per volume. But yes, the OP has a couple of incorrect ideas.
– doneal24
Commented Sep 23, 2021 at 17:54
3

The biggest risk with RAID hardware controllers is if the controller goes out, and the wrong disks in the RAID, were connected to those failed controllers. Linus did a video years ago on a failure to their main storage device, due to the type of RAID his company implemented, their entire RAID array failed due to the number of RAID cards that had failed. Once the wrong disks go offline in a RAID, data preservation and data recovery, can become very problematic. They were able to recover the data by getting each disk online and allowing a company to use a tool to rebuild the data.
– Ramhound
Commented Sep 23, 2021 at 18:17
3

RAID isn't a backup. RAID protects you against an individual HDD or SSD failure. RAID allows you to spread the risk of a single disk (or multiple disks) failing. However, if disks go offline than the RAID configuration online, then you have reach the limit of protection a RAID array provides. So multiple online and offline copies of the data is required to protect you against that particular risk.
– Ramhound
Commented Sep 23, 2021 at 18:21
2

RAID can't prevent problems due to motherboard failure for the same reason it can't prevent problems due to software bugs, bad device drivers, or human error. It isn't designed to detect those things.
– barbecue
Commented Sep 24, 2021 at 21:40

| Show 13 more comments

Peter Mortensen · Accepted Answer · 2021-09-25 20:41:06Z

22

As someone who has moved hardware RAID disks from a crashed server to a new server on several occasions, I disagree with the premise of the question. Software RAID has also been moved between systems. These have always been Linux servers, so I've never had to deal with situations where part of the RAID configuration is built into an operating system driver (somewhat common in Windows systems).

edited Sep 25, 2021 at 20:41

Peter Mortensen

12.2k23 gold badges71 silver badges90 bronze badges

answered Sep 23, 2021 at 17:00

doneal24

6434 silver badges10 bronze badges

1

The question is: why don't all RAID systems work like that? The premise of the question is based on other questions, like: example 1: "... but many other will store it in a format only readable by that type of RAID controller". And from example 2: "you can try disk cloning software, your layout may be recognized".
– Jojonete
Commented Sep 23, 2021 at 17:25
4

I don't believe @doneal24 is suggesting you can seamlessly migrated from different RAID controllers, but you certainly should be able to migrate a RAID, from system to system that has the same RAID controller without an issue.
– Ramhound
Commented Sep 23, 2021 at 17:36
3

@Jojonete Why don't all disk systems work the same after all these years? Shouldn't I be able to take my SAS disk formatted with xfs and put it into my Windows desktop? Different vendors have different priorities and have no incentive to be compatible with each other.
– doneal24
Commented Sep 23, 2021 at 17:44
8

@Jojonete They do all work "like that", but each in its own way. I've never seen a hardware RAID implementation where you couldn't just buy another card of the same model, swap it for the broken one and have the array come up as usual. This is obviously a non-issue for OS-based SW raid as well. The only situation where this sometimes is an issue is BIOS-raid ("fakeraid"), which is actually software RAID done by the disk driver. The issue there is that you might need to get the exact same motherboard, which might be fairly hard to find on the market.
– TooTea
Commented Sep 24, 2021 at 8:09
2

@TooTea: Or get software that understands that BIOS fakeraid metadata format to copy the data out, and back into a new array created differently, either SW RAID or a different HW or fake raid. This of course requires somewhere to put the data in the meantime, if you can't just get Linux md software RAID to recognize the existing metadata, or hard config it with sector offsets, to just continue using the existing disks instead of just reading once.
– Peter Cordes
Commented Sep 24, 2021 at 9:27

| Show 4 more comments

Richie Frame · Accepted Answer · 2021-09-24 05:38:13Z

So, my question is: why isn't there a standard solution to this problem, designed at least 20 years ago, and implemented since then by all RAID systems worth the name?

The phrase you are looking for is "import foreign RAID configuration"

Not all RAID systems are the same in the way they work, many store metadata on the disks so the controller an rebuild the array configuration if it is reset or replaced (with the same model).. and there is a massive amount of variation between controllers and their supported features.

Imagine I have a RAID5 array with a 1MB stripe size across 12 disks on an Adaptec controller, and the controller fails, so I replace it with an Intel controller hoping to get my data.

Well it turns out the Intel controller does not support 12 disks in an array, only 8, and it has a max stripe size of only 256KB. Of course it is not going to work. Even if it did support those, the controller metadata on the disks is not even close in format to allow them to be read. And even if it did, does it use the same parity algorithm? The same stripe alignment?

As far as I know:
There is no standard on where to store array/controller metadata on array disks
There is no standard on what format to store array/controller metadata on array disks
There is no standard to even store array/controller metadata on array disks
There are many different ways to do RAID, the internal method may be proprietary

And why should there be a standard solution? The solution is simple, buy another controller of the same brand that supports import from the old controller model. There is not much incentive for competitors to reject their own methods or neuter their feature sets in the name of compatibility, when in reality the lack of compatibility is not a big problem.

Now, you had several examples that point out on-board RAID, there are two types here, consumer and professional. Consumer on-board has no interest in portability, unless it is the same chipset vendor, I have gotten Intel RAID5 from one motherboard to another, it reads the metadata from the disk and rebuilds the configuration. Professional on-board almost always expects that you will just replace the motherboard or server with an identical model, and once again it should just work. Many on-board pro grade controllers can also be purchased in addon card form factor, giving you another option.

As Peregrino69 answered, RAID is not backup, if you are trying to use RAID to protect against something other than disk failures, you are doing it wrong.

RAID is more useful at providing large increases in both logical volume size and performance when using multiple disks together, and doing it in a semi fault-tolerant manner.

This is why I only use mirroring. I can actually drop the good disk into another computer and without even setting up raid mount the disk and copy all the files to another raid if I lose the motherboard. — Joshua, Commented Sep 24, 2021 at 21:20
@Joshua When you say set up mirroring, do you mean you have some app that you use to mirror data, or that you only use RAID 1? — edo101, Commented May 4, 2023 at 7:57
@edo101: At the time it meant hardware RAID 1 with certain hardware that didn't write RAID headers to the disk, but those are too hard to find now so I'm pushed to software mirroring. — Joshua, Commented May 4, 2023 at 13:58

rackandboneman · Accepted Answer · 2021-09-25 00:43:31Z

"the one and only reason for RAID systems to exist at all is to preserve user data in the event of hardware failure" ...

No. In the context of all "RAID systems" that exist, this is the second or third most important reason, and should be used as a reason only when a lot of details the setup (what kind of failure it will protect from, what kind not) around it are understood. The assumption might be mostly correct, however, when it comes to Home/Small Business NAS boxen.

RAID is a technology that originates in the server world - where the reasons "provide continuity of operation until repair can safely and/or conveniently take place", "provide a performance benefit to read and/or write operations" (this is not limited to RAID0), "provide an easy to manage abstraction of the space provided by several disks" are at least as common and important as providing the limited backup capability you describe.

In the server world, systems (orders of magnitude more expensive) exist that use multiple relatively independent "motherboards", power supplies etc.

The recoverability of a RAID system if you lose the controller hardware, provided the drives did not get damaged or overwritten in the course, is really only a function of how the controller hardware handles this, what configuration data in addition to the disks it needs to recover etc etc. A raid controller/nas mainboard that leaves you with no easy recourse in that situation should be considered of faulty design.

However, a misbehaving or misconfigured (eg cache policies) controller can mean data gets actively overwritten or mixed with nonsense data. An electrical defect (eg a power supply that suddenly outputs too high a voltage, or a transceiver that shorts power into a data port) can result in physical damage to the drives. In such cases, recovery can become a non-trivial, non-automatic, incomplete or even impossible operation. A perfectly working RAID can perfectly consistently do data alterations or deletions caused by user error, malfunctioning software, or malicious software. These are some of the reasons why the reliability of RAID alone to provide backup is considered limited.

Was going to comment the same thing somewhere on this answer if you hadn't already. A desktop motherboard could start to fail in many ways, including potentially causing memory corruption (e.g. a scratch on a DDR4 memory bus trace leading to bit errors). This can corrupt the filesystem (i.e. the data in the RAID array). Although more usually, memory errors come from failing DIMMs, or from software bugs, and yes corruption of files, or worse filesystem metadata, is a real risk and has happened to people. As you say, RAID is not a backup. — Peter Cordes, Commented Sep 25, 2021 at 12:09
@PeterCordes I chose the phrasing "limited backup capability" very carefully. I am tired of hearing both "RAID is a backup" and "RAID is not a backup", finding both statements can be harmfully misleading depending on context. — rackandboneman, Commented Sep 25, 2021 at 14:26
Great point, yeah I was mainly commenting to point out a mobo failure mode that could lead to corrupting the data/FS on the RAID array (so you're screwed if that's your only copy), didn't mean to raise the debate of whether RAID is or isn't a "backup". — Peter Cordes, Commented Sep 25, 2021 at 14:37
@PeterCordes It felt like, if assumptions like that in the question are becoming common, the debate needs raising :) — rackandboneman, Commented Sep 25, 2021 at 22:35

Krysten Devro · Accepted Answer · 2021-09-24 18:24:40Z

I can say with 100% certainty that you can replace a RAID controller card without loss of data, because I've done it on at least two occasions. The reason you can do this is that RAID controllers typically store the array configuration on the drives in addition to in their own memory, and the new card will just ask you if you want to import the configuration found on the drives. When doing this, you don't even necessarily need to have an identical replacement card, although that's certainly safest; a newer card from the same manufacturer will typically work as well. (eg, I recall that Dell supported moving an existing array from a PERC5/i to a PERC6/i, but not vice-versa.) The one thing to watch out for is that you need to make sure the drives are still in the same ports on the controller, or Bad Things happen. (On real server hardware with a proper backplane, the cables are often keyed so that you can't get them in the wrong order.)

However, I'd have far less faith in doing that with on-motherboard RAID. In my experience, built-in RAID tends to be terrible in many ways, and I wouldn't trust it with any configuration other than RAID 1 (and that only because you can just take one of the mirrored drives and use it as a single drive in another computer with no data loss; I still wouldn't try to actually move the mirror and trust the new motherboard to import the configuration correctly).

So, the answer is that there is a standard solution to the problem, and it is implemented on all RAID systems worth the name. It's just that motherboard RAID isn't worth the name.

This isn't strictly germane to the question, which is why I'm putting it in a comment rather than an answer; but it's worth noting that there has been a trend toward software-defined storage, which typically doesn't care much about what hardware it's connected to. Personally, I still trust real hardware RAID controllers more, but that's a readily debatable topic! — Krysten Devro, Commented Sep 24, 2021 at 18:23
I'm an old fart, I agree with you abt the HW controllers :-) But I'm generally very careful to state anything in IT business with 100% certainty. In my view the operative word in IT business is "should" - "this won't cause any damage" has more than once been heard as the famous last words :-D — Peregrino69, Commented Sep 24, 2021 at 18:31
More-or-less agreeing with the previous comment. You take your chances with fakeraid on Windows. md raid arrays on Linux are more recoverable. Hardware raid using the same controller is safe as long as you don't physically shuffle the disks. A motherboard replacement does not affect the configuration on the raid controller. — doneal24, Commented Sep 24, 2021 at 18:31

reinierpost · Accepted Answer · 2021-09-25 11:00:15Z

The reason RAID doesn't protect against a failing motherboard is that it was specifically designed to solve the problem of failing disks - not failing disk controllers, motherboards, or other system components.

In my experience, if you have hundreds of systems with spinning hard disks, those disks are the component most likely to fail, by far. Protecting against it can easily reduce the average failure rate of such a system by a factor 5 or 10. That is what RAID protects you from.

That being said, it sucks if you use hardware RAID to make your system more reliable, then have your disk controller fail, and lose the data on your disks, because you don't know how to replicate the controller's configuration, or the controller wrote something to the disks that made the state harder to restore, even when the disks are perfectly OK. This does happen. It would be nice if hardware RAID offered a standard solution for that.

I'm not sure what that would be. Where to save the configuration?

In any case, this is much rarer than a hard disk failure, and even if it were easier to recover from, the data will still be unavailable until you recover.

So if you want to be resilient against this, make your storage resilient against any failure of a particular system, by using some form of distributed storage, such as Ceph or DFS.

Stilez · Accepted Answer · 2021-09-26 16:24:39Z

Adding to the excellent answers already, you might also like to look at the ZFS implementation of RAID.

ZFS is interesting, because it acts as both volume controller and file system manager. That means, it controls the storage devices at both low level (devices RAID, rebuilding, pooling) and high level (file system, data caching).

The nice thing about ZFS' implementation of RAID is that provided there is one copy of the file system's data available somewhere in the entire redundant disk array, that pool of disks can be moved to (almost) any other hardware running a compatible version of ZFS, and the array will be just as recoverable. No RAID cards or onboard RAID used, so no issues.....

Randy Genew · Accepted Answer · 2023-02-10 21:22:30Z

RAID systems do protect against motherboard crashes. Including chipset (motherboard) based RAID.

RAID information, describing the array, is encoded on the member disks themselves - not in the controller or the motherboard.

The schema by which it is encoded, depends on the controller or motherboard. If you replace your dead motherboard with the exact same model, you will be able to recover your array. This will either happen automatically, once you set your storage ports into RAID mode in BIOS - or you may need to select an "import foreign configuration" option somewhere, while booting.

In my experience, Intel chipset, consumer-grade motherboard automatically imported my arrays (RAID-0 and RAID-10), from a 10 year old Intel motherboard, without incident. Different model, same vendor. No manual steps needed. This is despite the fact, that the old arrays were made before there was Intel VROC / IRST, so the backwards compatibility is flawless.

Maybe your experience may vary, depending on vendor (I never used AMD chipset RAID, so I don't know if it works the same way) - but my experience with motherboard RAID importing has been perfectly smooth, so far.

I've also replaced hardware RAID controllers (LSI / Dell PERC), with the same model and had no problems importing. Again, this is due to the same principle, that the array configuration is stored on the disks themselves.

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. — Community, Commented Feb 10, 2023 at 21:26

Loren Pechtel · Accepted Answer · 2021-09-25 05:09:46Z

0

A failure I have observed in the real world:

The controller blew, sector zero of every attached drive was zeroed out.

Actually, all the data was unharmed but copying it all off was tricky!

answered Sep 25, 2021 at 5:09

Loren Pechtel

2,6542 gold badges22 silver badges23 bronze badges

How did you do it?
– Peter Mortensen
Commented Sep 25, 2021 at 20:41
@PeterMortensen R-Tools, I figured out the RAID layout (it's guesser couldn't find the answer) and told it to find what it could in the space. Since the filesystem was actually intact it read it fine--but I needed a target to copy everything to, a recovery in place wasn't an option. Note that it was the only tool I found that could figure it out.
– Loren Pechtel
Commented Sep 25, 2021 at 22:16

Add a comment |

h22 · Accepted Answer · 2021-09-26 12:40:42Z

Because different RAID controllers use incompatible formats when they spread the data over multiple disks. You may mitigate the problem by using external (PCI-Express) RAID controllers that can be replaced or moved to another motherboard, and purchasing the spare controllers from the same beginning, so that all version numbers match. Even then, some controllers do not tolerate these actions so you must test in advance if your recovery scenario works. Another alternative is using the poorly software RAID as common on Linux servers.

MZ lemberg · Accepted Answer · 2023-03-08 12:15:47Z

you are wrong solution exists and it's very simple:

raid 5 or any mirroring : protect from one or more hd/ssd failures
if raid is hardware - like adaptec etc - you can bought additional battery to and/or mem protection card (on proplems with electricity or like you describe on motherboard issue - data in memory will stay until battery alive and continue regular operation on restoring etc); don't tested and interest what will happen if MB died and move raid card to other MB without battery disconnectio;

in old datacenters and new strong solution :

is fully reserved system : i.e. you duplicated income data to mirrored same PC with same capacity - additional raid; etc; when first system died (or stop respond) - reserved system is used (it can be cold or hot) - cold go alive and process ddata if first die [not best but low energy] and second hot - reserved system aalways online and duplicated in realtime data from first etc; had diff. designs but base logic is same;

good datacenters had similar flows base on that logic; and use hardware thata already had diff. options to achive best protection of data to prevent any loses;

Stack Exchange Network

Why don't RAID systems protect against motherboard crashes?

11 Answers 11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
motherboard
raid
.

Linked

Hot Network Questions

Why don't RAID systems protect against motherboard crashes?

11 Answers 11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged motherboardraid.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
motherboard
raid
.