0

First of all I'd like to say that I'm very new to managing RAID arrays in Linux so my questions are probably very basic, but I can't seem to find my exact scenario on the internet.

I have a RAID1 system with 2 HDDs and all partitions are installed on top of the RAID array, including the /boot partition. Today mdadm warned me that the array was degraded (probably one of the HDDs was failing) and the array was automatically turned into a degraded state when I checked, as follows:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0]
      204736 blocks super 1.0 [2/1] [U_]

md2 : active raid1 sdb2[0]
      151858048 blocks super 1.1 [2/1] [U_]
      bitmap: 2/2 pages [8KB], 65536KB chunk

md1 : active raid1 sdb3[0]
      4092864 blocks super 1.1 [2/1] [U_]

Then I proceeded to shutdown the server, replaced the failed drive and tried to boot from the remaining drive, but I am now greeted with the following message after GRUB tries to boot the kernel:

Error 17: Cannot mount selected partition.

Does anyone know where I go from here to boot the system from my working drive? I have basically zero knowledge of GRUB and I've tried reading the documentation a few times but it's still too complex for me and I'm in a hurry since now I have a powered off server in my hands. Any help would be appreciated.

2 Answers 2

0

If I understand correctly, your first drive (/dev/sda) is the one that you have replaced. Also, possibly when installing grub you missed installing it to both the disks as is needed with a RAID1 setup.

The issue is that grub is searching for the first hard drive which does not have grub installed. A quick internet search for the issue brings up: https://www.novell.com/support/kb/doc.php?id=7010670. See the steps there and on related pages and fix it as needed. For /dev/sdb you will need to use hd1 in grub.

1
  • Sorry for the late comment. I had read it back then but didn't really help me. Seems to be referring to GRUB legacy. Please read my answer above detailing how I solved it.
    – Railander
    Commented Jun 25, 2018 at 23:08
0

Answering my own question some good 2 years later...

I ended up just exporting everything away from this server and formatting it, but faced the problem again recently and decided to tackle it head-on.

Everything I'm mentioning here is done with:

  • CentOS 7 with ml kernel from ELRepo.
  • Both devices were added to a raid1 during OS installation (like so) and all partitions were mounted on top of the raid1 (/boot, / and swap). LVM or not doesn't seem to make a difference regarding this topic.
  • As with default CentOS 7 installations, it comes with GRUB2 instead of the legacy GRUB, a big distinction as most of the documentation you find online pertains to v1.
  • My setup DOES NOT contain an EFI partition. From what I found, EFI doesn't work on top of RAID, so big attention here as the procedure below won't work correctly if you do have EFI.

It became clear to me that the issue was that CentOS was only installing the bootloader (GRUB2) into the first physical device. Apparently it uses a tiny MSDOS partition (is partition the correct terminology? Maybe a 'flag'?) that cannot run on top of RAID. Because of this, even simply swapping the disks cause the system to be unable to boot as even though the secondary disk does have a full copy of the system, it doesn't have the bootloader.

Now to explore the solution, which hopefully is straightforward enough: copy the bootloader from the main disk to the secondary disk.

I was unsure how to proceed as I didn't know exactly how the bootloader calls the devices. Maybe simply mirroring the bootloader from the first one would cause conflicts, so it was possible I had to invert the entries so it knows which device is actually which.

Exploring on this, I found a solution to use dd to copy the bootloader onto the second disk. I had found a post from a user saying they always did that and always worked, but for me for whatever reason it didn't.

Trying to understand better how GRUB2 works, I found that it includes an 'install' tool that easily adds the bootloader to the desired device, and voila it was exactly what I needed! All I had to do was:

grub2-install /dev/sdb

Then after swapping the disks, the system still booted as expected! Half of the mission accomplished.

Then after confirming both devices can boot, I just had to remove the failed disk (now 'sdb') from the array (documentation of this can be found online, it involves marking the device as failed and then removing it from the array) and swapping it for the new disk, and then finally adding that back to the mdadm array and hopefully it should start syncing automatically (you can watch the progress with watch cat /proc/mdstat).

Also, don't forget to run grub2-install again on the new device after you're done.

Hopefully this can help someone else facing the same issue.

If you have an EFI partition, you'd probably have to find a way to copy it to the beginning of the secondary device (dd should do the job, though I've no idea what parameters you'd have to give it).

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .