1

My workstation, an Ubuntu v14.04 (Trusty) machine, has two system boot options. I can run Windows (for protools and music recording), but more often I boot up into Ubuntu for software development work.

I need to upgrade the old version of Ubuntu on it to 18.04 LTS. Before I do this, I am concerned about the health of the system drive. I'd like to scan this partition for bad sectors and/or other problems before I commit to the trouble of upgrading the OS. If the drive is in poor health, I might choose instead to buy a new drive and install the upgraded OS there.

I considered running fsck, but saw this warning:

There is one thing to note before you run FSCK; You need to unmount the file system using the ‘umount’ command. Fixing a mounted file system with FSCK could end up creating more damage than the original problem.

I can't umount the system drive, can I? That sounds like a good way to destablize my system.

I'll also point out that the disk partition where I installed the currently running Ubuntu OS is just one ext4 partition on a 2TB spinning disk. The other partitions on it are NTFS to run a separate windows 7 boot.

1

2 Answers 2

1

Scanning the file system doesn't tell you much about the health of the hardware.

The normal way to do this is to look at the SMART data of the disk. On Linux, this is done with the smartctl command from the smartmontools package.

Note: the contents of some fields aren't too well defined, huge numbers are more an indication that several byte/short values are encoded in a long int, and these have to be decoded (and this can be manufacturer/drive dependent)

The following copied from one of my memo files, maybe it's not totally my own:

SMART data

See https://help.ubuntu.com/community/Smartmontools https://www.isalo.org/wiki.debian-fr/Smartmontools

Setting up

Installation

You can install the smartmontools package from the Synaptic Package Manager (see SynapticHowto), or by typing the following into the terminal:

sudo apt-get install smartmontools 
Checking a drive for SMART Capability

To ensure that your drive supports SMART, type:

sudo smartctl -i /dev/sda 

where /dev/sda is your hard drive. This will give you brief information about your drive. The last two lines may look something like this:

SMART support is: Available - device has SMART capability.
SMART support is: Enabled 
Enabling SMART

In the case that SMART is not enabled for your drive, you can enable it by typing:

sudo smartctl -s on /dev/sda 
Testing a Drive

You may run any type of test while the drive is mounted although there may be some drop in performance. There are three types of test that can be conducted on a drive:

Short
Extended (Long)
Conveyance 

To find an estimate of the time it takes to conduct each test, type:

sudo smartctl -c /dev/sda 

The most useful test is the extended test (long). You can initiate the test by typing:

sudo smartctl -t long /dev/sda 
Results

You can view a drive's test statistics by typing:

sudo smartctl -l selftest /dev/sda 

To display detailed SMART information for an IDE drive, type:

sudo smartctl -a /dev/sda 

To display detailed SMART information for a SATA drive, type:

sudo smartctl -a -d ata /dev/sda

Understand the results:

https://lime-technology.com/wiki/Understanding_SMART_Reports

Note: This also works for IDE drives in new kernels that are being run through the SCSI stack and show up as /dev/sdX

3
  • I see that SMART data is also available through Ubuntu's GUI tools. While your answer is informative, it doesn't provide much detail as to how one would perform tests or view SMART data on an ubuntu system. Looking at it myself, I don't understand any of it.
    – S. Imp
    Commented Jul 6, 2019 at 16:58
  • 1
    Add some info from my own memo files
    – xenoid
    Commented Jul 6, 2019 at 17:45
  • is it safe to run these smartctl commands while the drive is mounted?
    – S. Imp
    Commented Jul 8, 2019 at 18:40
1

SMART vs. fsck

xenoid is correct about SMART statistics being a more reliable indicator of drive health. However, older drives may not capture the statistics. For other readers who might be booting Linux on a USB drive, you may not have access to the SMART statistics through the USB interface.

fsck can find and fix current problems. It doesn't tell you anything about past problems, or statistics that shed light on where the drive is in its life cycle. If you can't get SMART statistics, fsck is still better than nothing, and there's a way to at least get an indication if the drive is in the process of dying if you aren't in a hurry.

Also, note that it isn't an "either-or" situation. SMART tells you about the drive's condition, but it does nothing to actually fix current problems. So even with SMART, it is still a good idea to run fsck.

Using fsck to indicate drive condition

Running fsck will fix current corruption and deal with bad sectors. If any bad sectors were random events, or marginal spots as manufactured but not originally discovered, those will get caught and fixed. It's why the manufacturer leaves some spare sectors on the drive. Discovering those should be a rare but not unexpected event.

However, when the drive starts to die due to deterioration of the platters, new bad sectors will be discovered regularly. If you repeat the fsck scan every few days for a few weeks or a month, and any new bad sectors show up in that time, at least you know that the drive is already end-of-life. Stop using it as soon as possible, and use whatever life remains to get your files onto something else while you still can.

Running fsck

This brings us to your original question. fsck is similar in function to chkdsk, and for the same reasons, can't scan a drive that's mounted. The solutions are pretty much the same, do the scan before the drive is in use, or do it from another drive so the scanned drive can be unmounted.

Running chkdsk from the booted drive will give you a message that the scan will be scheduled for the next time you boot. You can similarly run fsck by selecting "Advanced options" in GRUB, when you boot. That will give you a choice of kernels, and booting normally or in recovery mode. Pick recovery mode for the most recent kernel. That will run through some checks and cleanup, then put you at a menu of options for what to do next. One of those options will be running fsck. That option will unmount the drive and run fsck.

You can see more detail on this process here: https://www.tecmint.com/fsck-repair-file-system-errors-in-linux/.

Another method is to boot a live Linux session and run fsck from there. If you notice in the above link, you can specify which drive/partition to scan by adding it as a parameter. Verify the drive identify in the live session (if it is the main drive on the computer, it will probably be labeled sda, but booting from another device may not label devices the same).

fsck vs. bad blocks

The actions are a little more complicated that just running fsck. fsck deals with filesystem problems. Those can come from corruption or bad sectors, and fsck will attempt to recover corrupted files. However, fsck, itself, doesn't deal with the underlying problem of getting any bad blocks detected and marked so they don't get reused. That part is kind of central to knowing whether the drive is end-of-life. The drive controller might do that, though, if it discovers the problem through activity for fsck, but then you won't be aware of the bad blocks, which is your indicator of drive EOL.

To explicitly deal with bad blocks, another utility, called badblocks, is used. Here's where things get a little complicated. You can manually run that utility using a process like described here: https://mintguide.org/system/283-how-to-check-and-fix-the-disk-for-errors-and-bad-sectors-in-linux-mint.html. There's a less complicated way, though.

When you run fsck, it may call another utility designed to work with the specific filesystem. For ext filesystems (there's a good chance that's what was used for Ubuntu), it calls e2fsck. If you run e2fsck directly, instead of fsck, it can run the badblocks utility itself if you add the lowercase c parameter to the command (see https://linux.die.net/man/8/e2fsck). This article describes using e2fsck to do this: https://www.techwalla.com/articles/how-to-fix-bad-sectors-in-linux.

fsck doesn't have a lowercase c parameter, but it does have an uppercase C parameter that does something totally different (displays a completion bar, see https://linux.die.net/man/8/fsck).

Linux commands are case sensitive, and I've seen articles, like https://mintguide.org/system/283-how-to-check-and-fix-the-disk-for-errors-and-bad-sectors-in-linux-mint.html, that show using the lowercase c parameter with fsck. I don't know whether the article is incorrect, or fsck is smart enough to pass the parameter on to e2fsck. I wouldn't trust it to do the job without testing it first, and then be meticulous with your typing.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .