Is this RAID-1 disk causing system hangs?

Question

Background

I've got a Windows 10 desktop with a few different disks/filesytems installed:

1TB SSD, primary/Windows is installed here
500GB SSD, secondary, used for VM's
120GB SSD, secondary, used for installing games
4TB RAID-1 (2x4TB Seagate HDD's), secondary, media storage
2TB RAID-1 (2x2TB WD HDD's), secondary, media storage

The problem is, I think, with the 2TB RAID volume. What's been happening lately is that if I browse/access this filesystem it'll eventually hang completely. To the point where I can't even end-task the hung process or even shut the computer down gracefully (I don't even get a BSOD; the system is basically stuck until I do a hard reboot). I can still move the mouse and interact with preexisting programs (so long as they're not also trying to access that volume, I guess).

This doesn't happen immediately, but if I go through several different folders or try to copy lots of new data to it (from the 4TB volume) that seems to be enough to trigger it.

So anyways, my assumption is that volume is the problem (it's also using the two oldest HDD's in the system). Though I could be wrong about that; happy to hear alternative explanations for the problem if there are any.

Question

I installed a SMART-checker utility and checked the disks in the suspect array. Both passed the 'short self test'. The attributes for one disk report as:

...and the other looks like:

There doesn't seem to be a huge difference, although the second disk does show a non-zero 'Raw Read Error Rate' and a much larger 'Multi Zone Error Rate'.

Is it plausible that these errors are responsible for the system hanging when accessing this RAID volume? Should I be heading out to pick up a replacement disk?

UPDATE (from comments)

The RAID is using the integrated controller supplied by the Asus B360M-K mainboard. Here's the device-manager screenshot:

The only obvious thing in the system event log is a couple of "Reset to device, \Device\RaidPort1, was issued" messages.

Are you able to isolated the system from the RAID, in order to verify, the problem does not happen while the volume(s) are offline? — Ramhound, Commented Jul 11, 2019 at 4:55
Is your RAID-1 hardware- or software-made? What can you see in system events relative to this problem? HDD's SMARTs seems to be safe, no HDD replace needed. — Akina, Commented Jul 11, 2019 at 4:57
Add screenshot for Device manager in view of Devices-By connection with all HDDs visible. — Akina, Commented Jul 11, 2019 at 4:59
@Akina The RAID is using the integrated controller supplied by the Asus B360M-K mainboard. Here's the device-manager screenshot: i.imgur.com/gLrqlIO.png. The only obvious thing in the system event log is a couple of "Reset to device, \Device\RaidPort1, was issued" messages. — aroth, Commented Jul 11, 2019 at 5:47
I think it is RAID controlled problem, not HDD problem. But I am afraid to advise carrying out experiment on physical or logical disconnecting of 4Gb RAID or its HDDs because I am not convinced that after experiment you will be able to connect it back without problems. Additionally - does your power supply is sufficiently powerful? — Akina, Commented Jul 11, 2019 at 6:29

Simon Richter · Accepted Answer · 2019-07-11 13:21:37Z

The default setting for harddisks is to retry on error, because there is a chance that the data might still be recoverable. The disk will then return the data once it has a successful read, or report an error after a (long) timeout. In a RAID set, you should reconfigure the individual disks to report errors immediately and never retry, so the RAID controller can fetch the data from another disk and rewrite the unreadable sector immediately.

When the disks properly report errors, the RAID controller can make a decision whether to mark the disk as failed, it usually does that when a disk reports an error writing a sector (because it has run out of remapping sectors then), and if your disk is truly bad, it will reach that state quickly.

You should do periodic read-only checks where all sectors are read and their checksums verified, I normally run these in a 14-day cycle.

"you should reconfigure the individual disks to report errors immediately and never retry" - Where/how is this accomplished? — aroth, Commented Jul 11, 2019 at 14:49

Stack Exchange Network

Is this RAID-1 disk causing system hangs?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
windows-10
hard-drive
raid
troubleshooting
.

Hot Network Questions

Is this RAID-1 disk causing system hangs?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged windows-10hard-driveraidtroubleshooting.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
windows-10
hard-drive
raid
troubleshooting
.