I had an HDD start failing in my RAID 1 array recently (two 2 TB HDDs, no operating systems installed on the RAID). The failing drive had to be replaced, but during debugging of another issue (which turned out to be unrelated) one of the disks in the array was taken offline (so the array showed as Failed Redundancy). After the issue was solved, I clicked the Reactivate Disk button on the drive.
That was on June 6th (my local time). It's been about 14 days since that started. I actually used the volume during the process (which I know slows reactivation down, but it was mostly read access (and not major), and no more than 1 GB of data was written).
The most information about run times I could find was a single forum thread I read somewhere. A user with two RAID 1 volumes sized approximately 100 GB and 500 GB reactivated the disk(s?) in one or both (ambiguous wording, IIRC) of them in 3 days. Assuming this is rounded down, and my hard drives could be less performant than those of that user, 3 days for the 500 GB partition would mean that for 2 TB partitions, run times of up to 16 days would not be unexpected.
No system software seems to provide any progress information. Using the disk management window is not possible as it doesn't even display any volumes, showing "Connecting to Virtual Disk Service..." instead as its status (while the service is busy reactivating the disk).
I had Process Explorer launched before I even started reactivating the disabled disk. This program was logging vds.exe
's disk access all this time. I also used perfmon
to check averaged disk access rates in bytes/second for these two HDDs.
It seems the Virtual Disk Service process is reading from one disk at its peak linear read rate (which is most likely the bottleneck), doing some processing (less than what would cause a CPU bottleneck) and writing to another HDD at a rate of 2 million bytes per second. At first, I assumed it will only need to write the disk's full volume, so by my calculations, the process should have taken approximately 12 days.
Process Explorer logging tells me writing 2000 gigabytes may have not been enough. The VDS process has already written over 2200 gigabytes of data (assuming all of it went into the one RAID 1 drive being written to all this time).
Another estimate is perfmon
's average read rate logging. I have done a full scan of one of the RAID HDDs prior to the HDD malfunction, and I believe I can recall its peak and minimum linear read rates, and currently the average over 1000 seconds is approximately one third from the maximum rate towards the minimum rate (reading seems to be slowing over time). That could mean I need to wait another month before the drive is reactivated (assuming the rate decline is not substantially different from linear), which is absolutely unacceptable, especially given I think no state is saved (so it has to start from scratch on any system shutdown), and I have expected to perform hardware installation soon.
My questions are:
- How long is Windows 7 software RAID 1 HDD reactivation supposed to take, and how would one approximate the run time of that process assuming properties of the array and drives in the array are known?
- If this process takes too long, is there any safe way to stop it which will revert the RAID 1 array to Failed Redundancy status instead of destroying it (so that the array could be backed up, recreated and restored from the backup, which should not take a long time) and will not require OS reinstallation?
- How much does read and write access to the RAID volume slow down reactivation?
- Windows does not seem to identify any USB storage devices I connect, does this have the same cause as disk management not displaying anything?
- Would reactivation be faster with RAID-optimized HDDs?