1

I had an HDD start failing in my RAID 1 array recently (two 2 TB HDDs, no operating systems installed on the RAID). The failing drive had to be replaced, but during debugging of another issue (which turned out to be unrelated) one of the disks in the array was taken offline (so the array showed as Failed Redundancy). After the issue was solved, I clicked the Reactivate Disk button on the drive.

That was on June 6th (my local time). It's been about 14 days since that started. I actually used the volume during the process (which I know slows reactivation down, but it was mostly read access (and not major), and no more than 1 GB of data was written).

The most information about run times I could find was a single forum thread I read somewhere. A user with two RAID 1 volumes sized approximately 100 GB and 500 GB reactivated the disk(s?) in one or both (ambiguous wording, IIRC) of them in 3 days. Assuming this is rounded down, and my hard drives could be less performant than those of that user, 3 days for the 500 GB partition would mean that for 2 TB partitions, run times of up to 16 days would not be unexpected.

No system software seems to provide any progress information. Using the disk management window is not possible as it doesn't even display any volumes, showing "Connecting to Virtual Disk Service..." instead as its status (while the service is busy reactivating the disk).

I had Process Explorer launched before I even started reactivating the disabled disk. This program was logging vds.exe's disk access all this time. I also used perfmon to check averaged disk access rates in bytes/second for these two HDDs.

It seems the Virtual Disk Service process is reading from one disk at its peak linear read rate (which is most likely the bottleneck), doing some processing (less than what would cause a CPU bottleneck) and writing to another HDD at a rate of 2 million bytes per second. At first, I assumed it will only need to write the disk's full volume, so by my calculations, the process should have taken approximately 12 days.

Process Explorer logging tells me writing 2000 gigabytes may have not been enough. The VDS process has already written over 2200 gigabytes of data (assuming all of it went into the one RAID 1 drive being written to all this time).

Another estimate is perfmon's average read rate logging. I have done a full scan of one of the RAID HDDs prior to the HDD malfunction, and I believe I can recall its peak and minimum linear read rates, and currently the average over 1000 seconds is approximately one third from the maximum rate towards the minimum rate (reading seems to be slowing over time). That could mean I need to wait another month before the drive is reactivated (assuming the rate decline is not substantially different from linear), which is absolutely unacceptable, especially given I think no state is saved (so it has to start from scratch on any system shutdown), and I have expected to perform hardware installation soon.

My questions are:

  • How long is Windows 7 software RAID 1 HDD reactivation supposed to take, and how would one approximate the run time of that process assuming properties of the array and drives in the array are known?
  • If this process takes too long, is there any safe way to stop it which will revert the RAID 1 array to Failed Redundancy status instead of destroying it (so that the array could be backed up, recreated and restored from the backup, which should not take a long time) and will not require OS reinstallation?
  • How much does read and write access to the RAID volume slow down reactivation?
  • Windows does not seem to identify any USB storage devices I connect, does this have the same cause as disk management not displaying anything?
  • Would reactivation be faster with RAID-optimized HDDs?

1 Answer 1

1

This is a partial answer exclusively addressing how to force that process to terminate (point 2). Needless to say, this is a last resort, and I do not recommend doing this normally. I believe this option would be equivalent to what would have happened if I encountered a power outage during reactivation. This is also what I did because I was fed up with having to wait indefinitely.

  1. I forced a reboot. My OS would not shut down by itself, so that required pressing the button on my chassis.
  2. I attempted to boot Windows. This showed a screen telling me that the previous shut down was unsuccessful, and to choose a startup option. I chose normal startup.
  3. Windows would not start (because of the RAID thing), so I waited half a minute (it's on an SSD so should have loaded sooner), did a forced reboot (reset button on the chassis) and tried booting Windows again.
  4. This showed a screen similar to the unsuccessful shutdown one, which told me Windows failed to start. I chose the startup recovery option.
  5. I waited until automatic startup recovery failed (it took probably about several minutes). After this, I was presented with a list of other recovery options.
  6. One of these options was launching a command line prompt, which is what I did. Then (only the meaningful actions listed):
    • I typed diskpart, starting this program.
    • Once this program started and displayed its own command prompt, I typed list disk and memorized the numbers of disks which had the mirrored volume. In my case these were 1 and 3.
    • I typed break disk=1 nokeep (nokeep because the volume is effectively in failed redundancy status and thus only has one working copy), which failed and reported that the sole healthy plex cannot be removed, so I typed break disk=3 nokeep. This nearly instantly (several seconds) disabled the copy on the "bad" disk and left me a functional simple volume on one of the disks with all its contents intact (so no backup was needed).
    • I exited diskpart, the command prompt, and rebooted.
  7. I successfully booted into Windows and opened Disk Management. One of the two ex-RAID disks was showing as healthy, the other had only unallocated space and showed as having errors. The "errors" disk allowed no other interactions than taking it offline (took a couple seconds), after which I brought it online (also a couple seconds). The disk now showed as healthy and allowed creating volumes (the menu items were no longer grayed out).
  8. I right-clicked the ex-mirror simple volume (still called "Mirror"), then clicked "Add Mirror". It let me choose the other disk as the mirror, after which it immediately created the volume and started resynching. In my case, I know it takes several hours, which is about as fast as one could expect with 2 TB hard drives, and a lot faster than 15 (or 45) days.

Needless to say (again), I do not recommend attempting this. I must also note this is better than what I had to do when I tried to configure onboard RAID (which failed).

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .