Filesystem corruption (?) and appropriate use of fsck

Question

Last week at work, our linux server (CentOS 5.5) was unresponsive to log in attempts, so I had to hard shut down it. After pulling out a couple of disks, on boot up it reported a degraded raid array and that fsck -p failed prompting to do a manual fsck. The server has 5x 2TB disks in a hardware RAID 5 array. Software side, I believe this is just arranged into one big logical volume that includes /boot / and /home, and a second logical volume for swap.

I reimported the RAID configurations on the removed disks, at which point the RAID array still showed degraded status, and the machine still returns fsck error on boot. The fifth disk started autorebuilding, but failed, probably due to file system corruption. Fortunately, I was able to recover the 2+ TB of data off the server using rescue mode (whew!). Then I ran fsck -yf on the logical volume, which made some changes. Now fsck returns clean on boot, but when I get to the Cent OS login screen, I am greeted with boxes replacing all of the font. An error of some variety pops up that prevents me from logging in, but I'm unable to read the error, since it is also all boxes. I also can't log in via text terminal (continually reprompted with login:, no chance to enter password) or SSH (server responds, but reports incorrect password).

At this point, I try to run fsck, but it tells me the filesystem is clean. I am still able to get in to the filesystem in rescue mode from the install dvd and the files I have looked at all seem to be OK. I'd really rather avoid a total reinstall, since this would require a lot of reinstallation and copying data back and from rescue mode files look to be intact. Did I totally bork it up by running fsck on the logical volume, or letting RAID auto rebuild? What are your recommendations on how to proceed?

larstobi · Accepted Answer · 2012-04-16 08:53:50Z

The RAID system (MD) knows nothing about the filesystem, so if it fails to rebuild it's not because of filesystem corruption, but more likely due to hardware error. Possibly one of your disks failing. Check for S.M.A.R.T.-errors using smartmontools and run a self-test.

When you run fsck -yf, this will try its best to fix the filesystem and in the process it may delete problematic inodes (files) (some files may be moved to the lost+found folder). Maybe the boxes you saw on graphical login was due to required files that were deleted by fsck. Not being able to log in via console or SSH may also point to missing files. Are you able to get shell access if you boot to recovery mode? You could attempt to fix things by restoring OS-files from backup, or force reinstalling software packages.

However, at this point, maybe changing the disks and do a clean reinstall would be better.

Additionally, you may want to backup what salvageable data you can. You dont have to start from a completely clean slate if you transfer some of your data. — J03L, Commented Sep 4, 2019 at 15:24

Stack Exchange Network

Filesystem corruption (?) and appropriate use of fsck

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
raid-5
fsck
filesystem-corruption
.

Hot Network Questions

Filesystem corruption (?) and appropriate use of fsck

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxraid-5fsckfilesystem-corruption.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
raid-5
fsck
filesystem-corruption
.