2

I'm quite new to Linux (Debian) and Btrfs. Started to test it on my DIY Nas at home.

Setup: - Medium-level hardware with 2 WD RED 3TB disks - Debian (latest stable) - Btrfs-tools (latest stable) - Configured a full disk raid1 setup and copied several gigabytes of data to it

Then I made a test and unplugged one of the two HDDs while constantly reading data. Surprisingly it couldn't continue the read operation from the mirror, but instead I got lots of red backgrounded frightening error messages about it.

I would expect from a RAID1-like system that it will silently handle such things for me. Is this a normal behaviour, or do I have some errors in the setup somewhere?

1
  • You should mention exactly which version of btrfs-progs (btrfs --version) and which kernel (uname -r) you used. Old versions of BTRFS had a lot of bugs which have been fixed by now.
    – basic6
    Commented Aug 22, 2016 at 14:35

1 Answer 1

2

The idea of mirroring is obviously that if one side of the mirror fails, the other should take over. In an ideal world, the sides should also work in tandem to increase read performance when both sides of the mirror are available.

That said, if one side of the mirror fails, then all the in-flight reads to the failed device will fail, possibly after a delay. This is normal and expected: a command was sent to a device that suddenly is no longer there and able to respond to the command, which will result in an error condition of some kind. The kernel will most likely log these failures to give the administrator a heads-up that "something bad just happened". The system may be configured to output these important kernel events to the console.

The litmus test for any mirroring solution is whether these errors actually propagate to the userspace layer, resulting in user applications receiving I/O errors (or worse, invalid data). If a mirror setup is working properly, as long as the other side of the mirror works properly, userspace applications should be unaffected save for the fact that the read took a little longer than usual and the system spat out some diagnostics about I/O errors occuring on the now-unavailable device. Neither of these should appreciably impact well-behaved userspace software.

If the userspace processes (rather than just the Btrfs code in the kernel) saw I/O errors as a result of your experiment, and you can reproduce the behavior at least reasonably consistently, then you may have come across a bug in the Btrfs code. In that case, you may want to file a bug report. Especially given that this is Debian, I would suggest first filing the bug report in Debian's bug tracking system and let them escalate it to the kernel developers if they feel that is warranted. Make sure to include as much pertient detail as possible, including the exact commands you are running, exact versions of everything involved, the exact text of the error messages, an exact description of your storage setup, and anything else you can think of that may help in tracking down the problem.

2
  • Thank you for the answer. Yes, the IO error propagated to the UI, namely to MC where I started a long copy for the test. Even when I hit retry it couldn't reestablish the read operation. I will try another test to see if it's consistently reproducible. Commented Nov 26, 2015 at 10:35
  • I could not reproduce this behaviour any more (tried 3-4 times with several combinations), the read was succesfully continuing. It will remain a mystery I'm afraid. I'm gonna accept your answer though, thank you. Commented Nov 27, 2015 at 9:54

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .