2

I run a number of CentOS 6 64bit servers with ext3/ext4 file systems. As far as I can tell, none of them have been shutdown improperly, but all of them have accumulated some file system errors that fsck now reports.

Now, a few drives (not file systems) have IO errors which are going to lead to hard drive failures (we run raid1) so is that leading to file system errors? I wouldn't think those errors would be allowed to get up to the file system?

At least one doesn't show any signs of hard drive failure but has fsck errors.

So, do ext3/4 file systems accumulate errors naturally over time or is something bad going on?

5
  • Why would you think a I/O error wouldn't interact with a file system error - if the I/O error is reading the file, what do you think the file system will do? - it's going to error if it can't read the file. No matter the cause. Commented Jan 16, 2017 at 16:16
  • Without more details it's difficult to say what happened exactly. ext3 is quite mature, I haven't seen any actual FS accumulating errors naturally over use in years. Unrecoverable I/O errors (unlikely for RAID 1) will lead to FS errors if they happen inside the FS structure. If RAID 1 somehow screws up error recovery (don't have personal experience with that), that also could lead to FS errors. I'd look closely at which blocks had errors, how raid behaved, and which blocks lead to FS errors.
    – dirkt
    Commented Jan 16, 2017 at 16:19
  • Thanks for the replies, @djsmiley2k, @dirkt. The IO errors reported by dmesg are at the device level, and only on one device, so I figured raid1 would do the right thing from the good device. Also, at least one server doesn't have any drive errors but does have file system errors.
    – Shovas
    Commented Jan 16, 2017 at 16:39
  • So I presume you're using mdadm or some software raid, not hardware raid? Commented Jan 16, 2017 at 16:43
  • @djsmiley2k Yes, mdadm software raid1 mirror.
    – Shovas
    Commented Jan 16, 2017 at 16:50

2 Answers 2

2

File system errors do not cause I/O errors which do not cause Hard Drive Failures. In fact, you have the causality completely reversed. Hard Drive failures cause I/O errors, which in turn lead to file system corruptions.

I/O errors will be reported as errors to user space. In some cases it may cause file system corruptions (which can be fixed by fsck), but in some cases it may only result in data block corruptions.

So in general, it is not "normal" for file system corruptions to collect in ext3/ext4 file systems. That generally means you have some kind of hardware problem. It could be a memory problem; or hard drive failures; etc. In fact if you are seeing I/O errors, you need to fix them first. Software bugs in general do not cause hardware failures!

2
  • Thank you for responding, @Theodore. I recognize your name from reading up on file systems :) I clarified my questions to be clear I wasn't thinking FS errors lead to drive failures. I meant would drive errors lead to FS errors in an mdadm raid1 setup where one drive is good? Definitely need to get those bad drives replaced but in real-world dedicated server hosting (ie. 1and1.com) they don't seem eager to replace drives for mirrors that are still intact :/.
    – Shovas
    Commented Jan 18, 2017 at 19:49
  • Marking as answer for confirming that physical device IO errors can lead to FS errors: "I/O errors will be reported as errors to user space. In some cases it may cause file system corruptions (which can be fixed by fsck), but in some cases it may only result in data block corruptions." I must have been hoping for more of an answer at that time but this answers the question. Thanks
    – Shovas
    Commented Mar 25, 2018 at 16:43
0

Ext3 is a completly reliable filesystem, which is not true for Ext4 (more depending on Kernel)

However, some errors can be made from loose data cables/connectors, or even vibrations/shocks made to the hard drive (hitting the PC case with your feet, moving your laptop, etc)

2
  • 4
    How many bugs are in a particular file system codebase is going to be dependent on the kernel version, but in general ext4 is just as reliable, if not more reliable, than ext3. In fact when we put ext4 into production use in Google, the fact that it was running on so many machines, and we could look for correlated failures, meant that we found and fixed a bug that was in ext3; but it was so rare that it survived multiple enterprise Linux certification test processes. (It almost certainly triggered on ext3, but it was probably written off as a hardware failure.) Commented Jan 17, 2017 at 4:49
  • Well, that's an unexpected answer since you're the ext3 maintainer and one of the ext4 creator... On the other side, that would certainly be the same for ext4, there always will be bugs that could take years to spot while they don't now for any software... But despite informing myself a lot on linux world for years, how come didn't I -and also a lot of people on the internet- got aware about the solving of ext4's main problem back in 2.6.30 kernel ?!? Anyway, I'll still stick to ext3 because of its maturity and will probably switch to ext 4 when people will jump to btrfs...
    – X.LINK
    Commented Jan 17, 2017 at 8:53

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .