0

Update, 23 later: the drive failed, most data can't be read, even though the SMART values did not deteriorate a lot. I wanted to replace it in two weeks, since I was busy. Don't fall into a "this won’t happen to me" mindset like I did. Listen to good advice and make a backup ASAP!

I am using a USB HDD (WD Elements 5TB) as a fairly non-critical NAS. Two months ago, I started monitoring SMART values, and they look concerning.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

March 9
  5 Reallocated_Sector_Ct   0x0033   179   179   140    Pre-fail  Always       -       19848

May 19
  5 Reallocated_Sector_Ct   0x0033   177   177   140    Pre-fail  Always       -       21144

Here is why I am confused:

  • The RAW_VALUE indicates that over 20 thousand reallocations have occurred, which seems implausible (?). Similar reports I have read reported obvious issues, but the drive performs fine, and there have not been any filesystem inconsistencies.
  • The VALUE/WORST/THRESH are normalized values. To my knowledge, the drive starts at 180, and assumes (imminent) failure as soon as 140 is reached. ATM, 177 is reported.

A decline of 2 points (179 to 177) seems high over ~10 weeks, but seems to imply that the drive still has a large amount of unused reserve sectors, and considers itself to be in decent shape.

No other metrics seem to imply imminent failure or read errors. Western Digitals own software and short self test reported no issues. Could anyone shed some light as to whether this value is actually correct and/or concerning?

Here is a full output of the SMART values.

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       27
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       4200
  4 Start_Stop_Count        0x0032   095   095   000    Old_age   Always       -       5352
  5 Reallocated_Sector_Ct   0x0033   177   177   140    Pre-fail  Always       -       21144
  7 Seek_Error_Rate         0x002e   200   193   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       12145
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       436
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       355
193 Load_Cycle_Count        0x0032   189   189   000    Old_age   Always       -       34277
194 Temperature_Celsius     0x0022   100   091   000    Old_age   Always       -       52
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       2346
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10552         -
# 2  Extended offline    Aborted by host               90%     10552         -

Selective Self-tests/Logging not supported
3
  • Its too bad you don't have the reallocated event count over the time period, but my advice is that if you check again, and see additional reallocated sectors, backup and replace the disk. bad sectors tend to grow exponentially with time, so if you have more than a few, and they continue to grow, the disk can no longer be trusted. Commented May 19 at 23:52
  • @FrankThomas I have checked, initially it was at 2204. The current value of 2346 was after rewriting everything on the disk, and hasn't changed since, but would it even increase if (almost) nothing is being written? At any rate, I now think it might fail soon. Commented May 23 at 18:47
  • bad sectors generally increase as you READ data, though they can be the result of write errors, but if you never read a sector that was previously, you'll never notice that it went bad between the last write and the current failed read. Commented May 23 at 21:08

1 Answer 1

1

I consider this value as concerning. Duplicate your disk using ddrescue under linux using the mapfile feature. If you have no storage space available but can afford the loss of the content of this disk, you can duplicate into the /dev/null file. This way every sector gets read and if you have any unreadable sector it will increase the SMART attribut 197 (current_Pending_Sector). Save the mapfile for later.

4
  • If the goal is just 'read every sector', then the SMART extended test will accomplish that in less time than "duplicate into the /dev/null file".
    – sawdust
    Commented May 19 at 22:51
  • The last output I initially posted was after or near the end of rewriting everything (I was switching file systems), and the SMART values did not deteriorate since. But I will replace the drive. Out of curiosity, if the drive is almost always only being read and not written to, is the chance of finding bad sectors lower compared to a mix of read and writes? Commented May 23 at 18:52
  • I don't know. Sorry! :)
    – r2d3
    Commented May 24 at 14:45
  • Thanks for the ddrescue tip, after the drive failed it miraculously rescued 100%, I had ~50 read errors initially but 2 short scrapes rescued all of them. Now i‘m trying to figure out how ddrescue managed that, because I tried reading a file 5-10 times while still in denial and it failed every time. Commented Jun 27 at 20:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .