5

EDIT/TLDR: See my answer below: https://superuser.com/a/1281893/855012

I have a brand new build that has a random problem that happens typically once or twice daily, but it runs great the remainder of the time. I've only had this build running for about 1 week and it has exhibited the problem since day 1.

Hardware/Build:
i7-8700k retail box with Hyper Evo 212 cooler
Gigabyte Aorus Z370 Gaming 5 motherboard (Latest bios F4)
32GB Crucial DDR4 2133 memory (8GB x 4) Part #: CT8G4DFD8213.C16FA (CAS: 15, RAStoCAS:15, RAS PreCharge:16, tRAS:36, tRC:50, 1.2v) (set to auto detect in Gigabyte bios).
NVidia Founders Edition GTX 1070
Samsung 960 PRO SSD
Samsung 850 PRO SSD
thermaltake toughpower grand rgb 750w PSU
nzxt h440 case (4 case fans hooked up directly to the motherboard fan controllers + 1 CPU fan to motherboard fan controller. LED/Fan hub connected to Molex from PSU).
Fresh install of Windows 10 professional

Problem
My recurring problem is that that the computer will just freeze and reboot randomly. This typically doesn't happen more than once per day, but obviously something is wrong. I never receive a blue screen or memory dump, and it doesn't seem to matter what I'm doing to cause it. I have been just sitting at the desktop with browser windows open (very light usage), or in the middle of a game, it's the same behavior, the mouse locks, it freezes/not responsive to any keystrokes, it either reboots immediately on its own, or I have to power it off and back on.

Event Viewer/System log at time of crash and reboot
I see a gap in the event viewer from 5:10pm to 6:44pm where there are no events, then at 6:44pm the next event is the restart.
Event ID 12: The operating system started at system time ‎2017‎-‎12‎-‎20T01:44:46.359321600Z. Event ID 6008: The previous system shutdown at 6:15:17 PM on ‎12/‎19/‎2017 was unexpected.
Event ID 41: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

What I've tried/Troubleshooting
1: I ran windows memory test and MemTest86+ v5 on all memory over night, received zero errors.
2: I ran windows memory test on each 8GB memory stick individually, as well as MemTest86+ on each stick individually, zero errors.
3: I tested each memory stick individually while in windows, I ran Prime95 on all threads while I ran FurMark at the same time, as well as opening browser windows etc to try to push it. I never received a crash/lockup etc., it just powered through perfectly in these tests utilizing all 8GB of ram during the tests. Each test I ran for 20-30 minutes with no reboots/errors. Temp for CPU never exceeded much over 60C during all torture tests. Temps for GPU would level off at 80C during tests.

My initial thoughts were that I have memory issues, but that it's just hard to reproduce, however I'm not sure that's the case any longer. I'm not really sure where I should go from here for testing.

I noticed with CPU-Z that my CAS latency is different for each bank of memory (SPD).
Stick 1: CAS 15
Stick 2: CAS 16
Stick 3: CAS 18
Stick 4: CAS 20

I have not overclocked the system, I've left everything to "auto" in the bios thinking that would be the most stable way to proceed. Should I be concerned about the CAS difference, and begin futzing with memory settings?

Any help or thoughts are much appreciated if you have the time.

7
  • 1
    Broken or under specified power supply?
    – DavidPostill
    Commented Dec 20, 2017 at 18:47
  • power shortfalls shouldn't cause a freeze (usually its an immediate black-screen), so I'd guess you have a motherboard issue (since you have throughly checked the RAM modules), or a bad CPU. its very difficult to isolate which, but I've never had a CPU just end up dead, and have been through my share of mobos. Commented Dec 20, 2017 at 19:02
  • @DavidPostill, I thought it might be the PSU, too. I checked out a tool I just heard about, outervision.com, with making some basic assumptions about hardware not listed, and that site suggests only a 450W PSU, so a 750w should be more than adequate... And yes, I've seen a bad PSU do weirder stuff than simply freeze the OS. Commented Dec 20, 2017 at 19:37
  • With something as random as this seems to be, I'd suggest start with as basic hardware as you can get to, while still having a running rig, and see if you can recreate it with only a specific set of hardware. Eg, remove all but the CPU fan, use 1 stick of RAM, 1 SSD, etc. If you find that swapping out different hardware doesn't fix the issue with what you have to swap with (mobo, PSU, CPU, GPU), then try to find a way to swap those, too. Once you have a stable system, you can start ruling things out, and re-adding them 1 at a time. Commented Dec 20, 2017 at 19:41
  • @DavidPostill I don't think the new PSU would make it randomly freeze and reboot, but anything is possible at this point, I might just exchange it for a new one to rule it out. The PSU is very overrated for what I'm doing, and in my tests it handles the system under full load for hours on end (both cpu/vid) at full wattage, whereas it locks up sometimes using almost no power.
    – Hopeless
    Commented Dec 20, 2017 at 21:01

2 Answers 2

1

“Event ID 41: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.”

This may be due to the power supply, which I note, uses “Active Power Factor Correction”. Some of these power supplies are finicky about the quality of the sinusoid power received.

I had a similar problem when I replaced the power supply for my Dell with an ultrahigh efficiency rating about three years ago. I had it plugged into a basic UPS where the output was quasi-sinusoidal but not sinusoidal. At the time I didn’t realize what “Active Power Factor Correction” meant and it took a while to figure out why the computer started to shut down randomly. After some researching and beginning to understand how power supplies get to be ultra efficient, I replaced the UPS with the sinusoidal generating UPS. These are more expensive but since installing it several years ago I have not had this problem recur.

I don’t know if the Thermal Take power supply used here can continue to power through a less than ideal sinewave, but it is worth the check, at least with the manufacturer to clarify and see if indeed a purity standard is required for the power input.

0

Well I now have a stable system, but the answer is more speculation than it is definitive.

I ended up returning all of the hardware from the first build (case/power supply/motherboard/processor/cpu fan/gpu) etc. I replaced every single part with the same hardware (new) from Microcenter. Kudos to Microcenter for being awesome about the issue I was having.

The one thing I added was that I purchased new memory....so it's very hard to tell what the real issue was with the first build.

When tearing down the first build, I noticed that I had connected the fan hub from the NZXT case to one of the motherboard headers. The fan hub has a cable labeled PWM. In the second build I did not reconnect this cable, I instead just connected the case fans directly to the motherboard fan headers.

The second build also now contains new Crucial Ballistix DDR4 ram.

I don't believe the issue was with the power supply in the first build (mainly due to the excruciating testing that the system endured with Prime95/Furmark, all stable) I think the problem was with the case fan hub PWM header, or with the old 2133 crucial memory that I'm now not using. Either way, I'm stable....even if this isn't a definitive answer for someone else, hopefully it guides someone further in troubleshooting.

Here's a great resource for setting up and troubleshooting the H440 case fan hub. I personally just disconnected all fans from that and plugged them directly in to the motherboard to have a more silent build.

H440 Case Fan Hub Instructions

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .