0

Background

I've got a Windows 10 desktop with a few different disks/filesytems installed:

  • 1TB SSD, primary/Windows is installed here
  • 500GB SSD, secondary, used for VM's
  • 120GB SSD, secondary, used for installing games
  • 4TB RAID-1 (2x4TB Seagate HDD's), secondary, media storage
  • 2TB RAID-1 (2x2TB WD HDD's), secondary, media storage

The problem is, I think, with the 2TB RAID volume. What's been happening lately is that if I browse/access this filesystem it'll eventually hang completely. To the point where I can't even end-task the hung process or even shut the computer down gracefully (I don't even get a BSOD; the system is basically stuck until I do a hard reboot). I can still move the mouse and interact with preexisting programs (so long as they're not also trying to access that volume, I guess).

This doesn't happen immediately, but if I go through several different folders or try to copy lots of new data to it (from the 4TB volume) that seems to be enough to trigger it.

So anyways, my assumption is that volume is the problem (it's also using the two oldest HDD's in the system). Though I could be wrong about that; happy to hear alternative explanations for the problem if there are any.

Question

I installed a SMART-checker utility and checked the disks in the suspect array. Both passed the 'short self test'. The attributes for one disk report as:

drive-1

...and the other looks like:

drive-1

There doesn't seem to be a huge difference, although the second disk does show a non-zero 'Raw Read Error Rate' and a much larger 'Multi Zone Error Rate'.

Is it plausible that these errors are responsible for the system hanging when accessing this RAID volume? Should I be heading out to pick up a replacement disk?

UPDATE (from comments)

The RAID is using the integrated controller supplied by the Asus B360M-K mainboard. Here's the device-manager screenshot:

enter image description here

The only obvious thing in the system event log is a couple of "Reset to device, \Device\RaidPort1, was issued" messages.

6
  • Are you able to isolated the system from the RAID, in order to verify, the problem does not happen while the volume(s) are offline?
    – Ramhound
    Commented Jul 11, 2019 at 4:55
  • Is your RAID-1 hardware- or software-made? What can you see in system events relative to this problem? HDD's SMARTs seems to be safe, no HDD replace needed.
    – Akina
    Commented Jul 11, 2019 at 4:57
  • Add screenshot for Device manager in view of Devices-By connection with all HDDs visible.
    – Akina
    Commented Jul 11, 2019 at 4:59
  • @Akina The RAID is using the integrated controller supplied by the Asus B360M-K mainboard. Here's the device-manager screenshot: i.imgur.com/gLrqlIO.png. The only obvious thing in the system event log is a couple of "Reset to device, \Device\RaidPort1, was issued" messages.
    – aroth
    Commented Jul 11, 2019 at 5:47
  • I think it is RAID controlled problem, not HDD problem. But I am afraid to advise carrying out experiment on physical or logical disconnecting of 4Gb RAID or its HDDs because I am not convinced that after experiment you will be able to connect it back without problems. Additionally - does your power supply is sufficiently powerful?
    – Akina
    Commented Jul 11, 2019 at 6:29

1 Answer 1

1

The default setting for harddisks is to retry on error, because there is a chance that the data might still be recoverable. The disk will then return the data once it has a successful read, or report an error after a (long) timeout. In a RAID set, you should reconfigure the individual disks to report errors immediately and never retry, so the RAID controller can fetch the data from another disk and rewrite the unreadable sector immediately.

When the disks properly report errors, the RAID controller can make a decision whether to mark the disk as failed, it usually does that when a disk reports an error writing a sector (because it has run out of remapping sectors then), and if your disk is truly bad, it will reach that state quickly.

You should do periodic read-only checks where all sectors are read and their checksums verified, I normally run these in a 14-day cycle.

2
  • "you should reconfigure the individual disks to report errors immediately and never retry" - Where/how is this accomplished?
    – aroth
    Commented Jul 11, 2019 at 14:49
  • I'd love to know this too!
    – Goz
    Commented Nov 14, 2019 at 12:44

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .