0

Yesterday, all my Hyper-V guests are running normally. Today, all guests are stalling during startup. The Windows guests seem to boot but then are stalling with a screen showing the Windows logo or the timer, showing no progress over hours. The host system is seemingly running fine.

The host system is Windows Server 2012 R2, the guest systems are also Server 2012 R2 and Server 2016, some Linux guests are also running (and are also not starting up anymore). The system has been running for years without problems. The hardware is pretty cheap (AMD Ryzen 5 3600 CPU with SATA drives, HDD and SSD, 64 GB RAM). Four HDDs are bundled in a mirrored storage pool. The OS runs on an SSD. It's not the most powerful system but serves my personal needs.

I have checked the system event log. One SSD, which is currently not used for VM guests, and which is also not in the storage pool, reported a bad block. I ran chkdsk to "fix" it. Otherwise, no relevant errors show up in the event viewer. The storage pool reports no errors. I ran HWINFO64 but found no indications of any faults.

The guests need an unusual amout of time after starting the VM until the guest OS is becoming active. In one example (Linux), normal booting seems to occur. The bootloader shows up. Afterwards, it's all very slow. The boot process takes minutes instead of seconds. When it comes to starting GNOME, it stalls.

I suspect there is an I/O problem. So as a next step, I think of replacing the mainboard.

Is there anything else I could/should check beforehand?

4
  • Restart the server and update it. Try that.
    – anon
    Commented Jul 15, 2022 at 11:45
  • I didn't mention it, but I have done that... no effect.
    – lzydrmr
    Commented Jul 15, 2022 at 11:53
  • Run chkdsk, examine the Event Viewer, check the SMART status of the disks, run sfc /scannow.
    – harrymc
    Commented Jul 15, 2022 at 17:45
  • Thanks for your suggestion. I changed the mainboard now, and the VMs are starting right up again! However, one of the four harddisks in the storage pool now was being reported as faulty (before the switch, it wasn't). I replaced it, and the storage pool is now being reconstructed. I can run the VMs anyway, so hooray to RAID! After the reconstruction, I will change the mainboard back again. I think a mainboard fault is unlikely, so I expect everything to run smoothly with the old mainboard (which is actually newer than my replacement mainboard).
    – lzydrmr
    Commented Jul 15, 2022 at 19:08

1 Answer 1

0

As the storage pool did not indicate a problem at first, I suspected a mainboard fault (however unlikely). So I replaced the mainboard, which enabled me to run my VMs again. However, one of the disks in the storage pool was now flagged and shown as "lost communication". So my guess is that in the original configuration, the bad disk slowed the storage pool massively down. Replacing the mainboard should not have had any influence, in my understanding, but it made the fault obvious.

So I replaced the bad disk and let the storage pool reconstruct itself over night. During the reconstruction, the storage was usable. Eventually I switched back to the original mainboard. Now, everything is running smoothly again.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .