10

In the days of old, I remember getting drive errors, but it seems that modern drives never report errors, but instead make a best effort to return you something. I recently had a hard drive fail, rather badly, but while it was failing it never reported errors (or at least WinXP never surfaced those errors). I knew it was failing because programs began behaving badly and it finally died during boot. When I attached the drive to another machine to read everything off, I was able to copy everything (after some permissions flail) and it did so without error, but the actual content was damaged as archive testing proved. The manufacturer's drive testing software determined there were no errors but SpinRite hard stops while scanning the drive. I'm beginning to wonder how much of the instability of modern software is attributable to modern hard drives.

So the question is, are hard drives now just lying to us? Specifically, when faced with an unreadable sector, are modern drives prone to return corrupted data without reporting it as such to the OS?

0

7 Answers 7

9

Yes, newer hard drives lie to us. You can usually monitor those lies with SMART.

I think it has to do with the information density on typical platters. The designers assume that there will be flaws in the platters, and design the firmware around that - if a sector fails, it's automatically re-written and no data is lost. It's only when the drive runs out of spare sectors that the typical OS will notice, and at that point your data is at risk.

So, I guess the moral of the story is use something like smartmontools to monitor the lies.

2
  • 6
    The bad sector isn't rewritten - modern disks contain spare sectors that the firmware uses to replace defective sectors with automatic mapping.
    – harrymc
    Commented Nov 7, 2009 at 12:15
  • On top of that, you need a separate tool that will read the SMART data.
    – surfasb
    Commented Feb 14, 2011 at 6:12
1

I know that the new file system, ZFS, actually reports when it finds bad sectors on your hard drive. Maybe the problem isn't so much the hard drives themselves as lack of a modern enough file system. Hard drives do detect bad sectors sometimes, and re-map them to good ones, but it's clearly not enough.

0

As far as I know, typically you'll see that errors can be detected (using a type of hash check?) and if a sector is failing then the drive will retire that sector.

If there is a failure with the read head itself or some other mechanism than the bits on the disk, then you might be hard pressed to actually detect that.

0

Tough to say if hard drives are lying to us. I'm to the point where a solid RAID controller and multiple disks are what I rely on. If one or two of the die, so be it. Moving parts are more difficult to deal with. With SSDs slowly making headway in the marketplace, who knows how hard drives may 'lie' to us in the future.

I think newer disks these days find bad sectors and then mark those sectors as bad so nothing can be written to it. I can't recall accurately, but I know newer disks do this these days. Is this (preventative measures) really lying? Tough to say. But if you really want to know what your hard drive is doing, get SpinRite. It'll tell you everything you've ever wanted to know about your hard drive.

0

I just had a disk die (had to freeze it to save what I could from the click of death) and bought an external to do a backup. For about a week I had Ubuntu on an old drive (first generation SATA that was IDE with a SATA interface). I knew the disk was old and wouldn't last long, but it wasn't until I installed Fedora on a different drive that I got warnings about drive failure being imminent.

My theory: It's quite likely that consumer-friendly operating systems like Windows XP and Ubuntu will not by default show these SMART errors.

0

Lately I've been told of 2.5" hard drives, in laptops, crashing; but I've never really experienced a true hard drive crash in 30 years of computing. I have one now, because a power surge in a desktop corrupted my Mac memory, which corrupted the file system. A $40 line conditioner would have eliminated the power spike, and daily backups (and good partitioning, /User) will repair it. Soon I hope to add a larger, second PATA drive and mirror the /Usr partition.

Ironically, this was less likely to occur in my 1984 IBM PC, whose memory had a 9th parity bit for every 8 bits. (In those days I used SpinRite, and I'm pleased to read it's still doing well.) There are free, TSR programs that check your disk regularly and log, mail, or (in Windows) toss up a warning if things seem bad. (I should be more comfortable comparing two logs.)

My machine is for scientific computing: I repeat all important computations. Servers and desktop machines (formerly workstations, like Suns), for those who cannot afford the time to do this, should have ECC memory (with an extra bit per byte), which takes very little extra time & money. However, it's available today only on professional servers, workstations, 2009 Power Pro Macs of great speed, and no doubt some expensive Windows machines. If you're a physicist post-processing supercomputer data, or just an actuary, you might need one of these. Memory in the future will likely count errors to predict upcoming problems with a memory bank.

An online book I've found useful is 'Minimizing Hard Disk Drive Failure & Data Loss', online at: http://en.wikibooks.org/wiki/Minimizing_hard_disk_drive_failure_and_data_loss

Hard drives, ATA & SCSI, for about 15 years now, have used S.M.A.R.T. to predict upcoming drive failure. Though different companies use different criteria to throw up a warning window, the meaning of many of S.M.A.R.T.'s numerous measurements are clearly given in the Wikipedia article on it. You needn't rely upon your software company to calculate a single number, like an IQ. :-) Check the red sections of en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

Those who can use a command line can measure these attributes using a free package from Sourceforge called 'smartmontools'. (The Windows version pops up a window.) Find it at sourceforge.net/apps/trac/smartmontools/wiki/TocDoc

Try /usr/local/sbin/smartctl -i /dev/hda, or try /usr/local/sbin/smartctl -i /dev/sda for SATA drives on Windows.

All the numbers it gives can be evaluated by examining the above Wikipedia page. Also provided is a resident program that tests the drive every now & then for slow degradation. If you wish to tune your drive (for faster speed or) to make it slower, quieter, & more reliable, you can also try setting the hard drive parameters with 'hdparm', found at sourceforge.net/projects/hdparm/

I haven't the Windows documentation, but on Debian Linux I use:

/sbin/hdparm -i /dev/hda

for my PATA drive, just for information (and information on secure deletes). Thus far, I've left the default settings alone.

-1

Modern harddisks use SMART but this only works up to a point. When the disk's data is sufficiently "broken" then the disk will give up and you've lost the data.

There are tools like GRC's SpinRite that can look past SMART - and these can sometimes rescue your data even when hope seems lost.

I regularly run SpinRite on my disks. SpinRite tests the written data, and optionally refreshes or even recovers it.

2
  • It looks like SpinRite hasn't been updated in a long time. I found a page mentioning limitations with its SATA support (grc.com/sr/kb/sata.htm) and mentioning it might be improved in version 6.1, but that release seems to have never arrived. I sent an email to the creator to find out the current status of the product. I'll add another comment here if I find out anything more. Commented Dec 5, 2009 at 23:08
  • True, the version is old but it is still valid (except as noted in the SATA page). Commented Dec 13, 2009 at 10:37

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .