0

I have a server with 4 drives in a RAID 10 array. Recently my server was down with the Array not detecting the drives. Only a single drive is currently being seen by the RAID card while a second drive is showing as inaccessible. And 2 drives are showing multiple errors. Unfortunately I don't have the latest Offsite Backup.

I was recommended to clone the RAID 10 array using Acronis but there is a possibility that it could clone the data over but not be bootable or it could completely fail at any point.

What is the safest solution to recover the data in this case. I don't want to lose my data.

11
  • 3
    If 3 drives have failed with 10 then you have already exceed the number of the drives that can fail.
    – Ramhound
    Commented Sep 17, 2017 at 9:22
  • 1
    Thanks for your reply. What I was told by my provider was that Drive A & D are not accessible. And others could be failing. I can try to rebuild the array but there are chances that the drives will lose the data during the process. So what are the safest options to recover the data?
    – Car12
    Commented Sep 17, 2017 at 9:31
  • You won't be able to rebuild your array due to the number of drives that are not accessible and/or have failed. You have 0% chance of data recovery with the number of drives you indicated that are inaccessible.
    – Ramhound
    Commented Sep 17, 2017 at 9:38
  • All drives are currently showing. The array was detected once again but is currently in offline status. I was unable to force the array online as it is corrupt and it displayed the message stating that there was not enough segments to bring it online. Could anything be done in this case?
    – Car12
    Commented Sep 17, 2017 at 10:02
  • 1
    @Carmin : with four responding but faulty drives, things are potentially salvageable, but it will be neither quick nor easy. See TOOGAM's answer (there are firms that do this professionally; in which case it won't be cheap). You might still lose some files. It looks like your drive went too long with an unchecked faulty state, until the fault became unrecoverable; this may hint at an error in the maintenance/inspection process.
    – LSerni
    Commented Sep 17, 2017 at 10:32

1 Answer 1

3

First, know that lots of people like to say, "RAID is not backup". The reason why a business should use RAID is to minimize downtime. The reason why a business should back up data is to be able to restore data to a prior version. Yes, technically RAID 1 does essentially "back up" data from one drive to another drive, but lots of threats to data will affect not just one drive, but both. So, the purposes of RAID and backup accomplish very different things, which is why many people like to say "RAID is not backup".

Only a single drive is currently being seen by the RAID card while a second drive is showing as inaccessible. And 2 drives are showing multiple errors.

I agree with Ramhound. It sounds like you're doomed. Sorry.

If just one drive had problems, you certainly may be able to get by with that. However, if you want things restored to "great shape", you'll need at least 2 drives (and they need to be the right drives!) to pull off a good restore for a scenario like this. It sounds like you have 3 drives with problems (one being unresponsive, and 2 others with errors). If that's the case, you don't have enough working to resolve this completely (if at all), in which case you're doomed to experience data loss (possibly catastrophically losing it all). If that language sounds overly harsh, then I'm sorry: I don't mean to be insensitive, but rather I'm just trying to favor bluntness and clarity.

If you try to restore a RAID 1 (which has two parts), you need to restore from the part that has no errors. Otherwise, you'll end up with errors. If you can't tell which drives have errors, you might need to start by backing up all of the drives (using bit-by-bit/forensic copies, as mentioned more later), so that if you restore with the wrong drives, you can try again. So, you may need to have quite a bit of available storage capacity to pull this off most safely.

If continued efforts result in you being able to get your non-working drive to function again, and that drive is good, then you may be able to get a good restore despite two drives that aren't able to properly give you all of the data. That might be possible. Maybe. The rest of this answer will explore that possibility.

Sadly, RAID terminology is not universal enough to provide clarity for us to know what drives you lost, based just on the information you provided so far. You mentioned using RAID 10. Well, is that:

  • a RAID 1 drive that then got striped into a RAID 0,
  • or a RAID 0 that then got placed into a RAID 1 mirror?

The correct answer is...

this is vendor-dependent.

Yup. We just don't know. I'm basing that conclusion off of PC Guide article on multi-RAID levels which says that RAID10 usually means RAID 1 and then RAID 0 (which will be the better scenario for you), but some...

other companies reverse the terms! They might call the RAID 0 and then RAID 1 technique "RAID 1/0" or "RAID 10" (perhaps out of fear that people would think "RAID 01" and "RAID 1" were the same thing). Some designers use the terms "RAID 01" and "RAID 10" interchangeably. The result of all this confusion is that you must investigate to determine what exactly a company is implementing when you look at multiple RAID. Don't trust the label.

So, anytime that someone says RAID10, don't trust what order they've done. Figure it out.

If you have mirrors that are striped, so your layout looked like this:

AB = RAID1
CD = RAID1
(mirror of the two RAID 1s)

Then losing drives A and D cause each RAID1 to be degraded but functioning, and your stripe has both parts working so it is fine.

If you have stripes that are mirrored, so your layout looked like this:

AB = RAID0
CD = RAID0
mirrored...

Then each RAID0 is lost, and you have mirrored two lost drives, so you have nothing salvageable.

Fortunately, it looks like most RAID10 implementations will be mirrors that get striped, so odds are in your favor.

Assuming that the layout is favorable for you, you want to get a backup of the drives before doing any re-build. Let me clarify: you don't want to back up files. You don't want to back up partitions. You want to back up drives. Entire drives. Make sure your backup process does a full "bit for bit" archive, sometimes called a "forensic copy", which copies ALL data on the drive, including unused bits and (quite importantly for you) meta-data like drive signatures that the RAID "software" may be using. (By "software", I don't necessarily mean a program stored on the hard drive, but the logic which might be embedded into some circuitry you have, depending on just what RAID you're using.)

I was recommended to clone the RAID 10 array using Acronis

I don't recommend Acronis due to problems I've experienced professionally. That said, I know that Acronis is pretty popular. My preference for this scenario would be any Unix (which could possibly include an Acronis boot CD) and use dd, possibly in conjunction with netcat (if the drives are remote). This may take a bit of learning to pull off, but if everything goes smoothly then I would have a fair amount of confidence in the end result of the backup task (depending on whether the destination drive ends up being suitably reliable).

there is a possibility that it could clone the data over but not be bootable or it could completely fail at any point

I would say "yes", there is that threat. I do believe that sometimes rebuilds may fail for not-very-great reasons... and retrying from another disk may work wonderfully. That is why you really, really, really should get a very clean backup before you start the re-build. Always make sure you're NOT using your only copy of any of the data when you start a re-build.

Once you do have a perfect backup (which you can verify rather easily if the hardware still functions right, by doing a bit-for-bit comparison of every byte on the drive, which might be easier to do in Unix than some other operating systems), then you've got rather little to lose by trying to re-build. So, be very paranoid about getting that backup made quite correctly, but then relax during the possibly (much) longer process of the re-build occurring. (At that point, nothing that happens during the re-building process should give you any trouble unless you also have problems with your backups. So besides the multiple drives that you lost, there would need to be yet another unlikely problem, which is rather unlikely, to be doomed... if your backup was made well.)

Once you verify that the rebuild reports success, verify that your data seems to be restored okay (check data from different sections, hoping to verify multiple drives), and then don't consider your "fix this problem" process to be complete until you have a working routine backup solution.

5
  • I love this answer. Would you care to get a little more detailed with dd (e.g. why not ddrescue?)? A good red from a wiki would do, but I would assume that everybody who is in the situation of RAID degradation would be close to a nervous breakdown (I know I would be ;-) ), so giving them confidence in their options is possibly the best way to protect them from a complete loss of data.
    – flolilo
    Commented Sep 17, 2017 at 10:37
  • 1
    ddrescue: it sounds like there may be physical damage, so: good suggestion. It is less commonly easily available on various bootable Unix systems. That inconvenience may be worth overcoming in order to restore valuable data. My answer was written with the idea of aiming for full recovery, which is why I noted "if everything goes smoothly" when I discussed dd. I would expect that skipping just some data on a drive would have an amplified negative effect (being more prone to cause trouble for more data) when RAID0 is part of the picture. Maybe it would even affect the rebuild process?
    – TOOGAM
    Commented Sep 17, 2017 at 12:05
  • @flolilolilo : About "a re[a]d from a wiki": I've published my own guide for backing up w/ dd & netcat, so I don't intend to invest effort searching for other 3rd party docs to hyperlink to. Actually, this answer already includes some info I just found from part of my own online documentation. When I 1st joined Stack Exchange, moderators complained my posts referred to my own online docs too much, concerned my posts' goals may be capturing traffic. So I behave (sometimes by not answering, @ other times @ some sad cost to my answers' quality), by pointing to my own site only extremely rarely.
    – TOOGAM
    Commented Sep 17, 2017 at 12:09
  • Concerning "a re[a]d from a wiki": Oh, I see. I have to admit that I wasn't aware of the traffic-generation-"problem". Full withdrawal from my request, then! Concerning ddrescue: My knowledge of rescuing data is purely non-professional, so I'd go with everything you say and fill the gaps with knowledge from other sites, not daring to state those things here (because of missing confidence in my own not-yet-approved knowledge).
    – flolilo
    Commented Sep 17, 2017 at 15:23
  • @flolilolilo : Be wary of giving somebody too much credit; just because they seem to know quite a bit of a lot, doesn't mean they are accurate about everything they say. In my case, I try hard to be accurate, but the last two sentences of my ddrescue comment (starting with "I would expect") is based more on my own reading than with actual professional experience.
    – TOOGAM
    Commented Sep 17, 2017 at 17:26

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .