0

Follow-up Q is at Computer restarts randomly - all parts A/B tested


So, I have this system which has a Gigabyte H97 board with an i5 CPU and 2x 4GB Corsair DDR3-1600 sticks. Two HDDs - Windows 7 on the WD and OpenSuSE Linux on the Seagate.

Background: One day in June, it wouldn't resume from sleep into Win 7. Case buttons didn't do anything, shorting the power pins didn't work either. Sent MoBo to dealer who sent it to Gigabyte.

When returned, it worked but with one quirk. The 2nd HDD - Seagate - would occasionally disappear in Win i.e. the drive letters would vanish. Disabling and re-enabling the relevant ATA channel in Device Manager would restore drive visibility. There was no data corruption on that drive. SMART indicators looked normal in Speedfan. This happened, maybe 3-4 times, within a few weeks of return and then stopped. When booted into Linux, it would never freeze as if the drive had died or was missing. This problem never occurred with the first (WD) drive.

Current problem: One day in mid-October, shortly after resuming from sleep, the system suddenly restarted. There was negligible CPU load at the time. Once back in Windows, I checked the system events, and spotted

The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

There were no other event messages that threw light on what had happened.

It happened again the next day. Thrice. I ran the Windows 7 memory diagnostics tool available from the boot menu and ran the two passes. No errors reported. The next restart happened three days later. The one after that two days later. I called my dealer to take a look. On the day before he took it, it happened four times. Twice while in linux; once while in the BIOS. He took the whole case, minus the drives. He was able to reproduce the behaviour at his office with his Win 7 HDD. So he moved the MoBo along with my CPU/RAM to another case and PSU. After a day and a half, there were no restarts, so he inferred the problem was with either my PSU or case - both Corsair.

I brought back the system - my MoBo in his case + PSU - to test for a few days. About 15 minutes after reconnecting my drives and booting into Windows, it restarted. I decided to keep it for a few days to see if the frequency of restarts had changed. Now, the old problem reared its head again. The 2nd HDD's drive letters disappeared but this time cycling the ATA channel didn't work. I rebooted into the BIOS and the drive wasn't visible. I shut down the system and checked the data & power cables. The drive reappeared and I booted into Linux normally. The data seemed fine. Back into Windows, the SMART indicators still reported no errors.

Over the next week, the system restarted once. The drive disappeared once, and so I switched the SATA port to which it was connected. Again, the drive functioned normally for a few days before it disappeared again. Cycling the ATA channel didn't help. It didn't show up in BIOS either, but I decided to try to POST again by changing some inconsequential setting in the BIOS (actually UEFI). The 2nd drive's EFI entry showed up in the boot menu and I could boot into Linux. During all my Linux sessions, this 2nd HDD never 'disappeared' or froze.

Then the system resumed the spontaneous restarting with some frequency - ~2x in 3 days. So I decided to get it sent for repair (yes, yes .. why the hurry?). My dealer sent it to them - just the Mobo - a week ago.

I spoke to the Gigabyte service centre today who have given a clean bill of health to the MoBo, saying that they ran a 'burn-in' test for two days and have found no issues.

Hence, my question: what do I tell them to look for, in order to reproduce the behaviour? They have suggested that if I send across my CPU+RAM, they can test with that combo. I may do that, but is there anything specific I can ask them to test?.

Thanks for the patience.

6
  • 2
    The system restarted in BIOS settings, so it’s not a software issue. If the Seagate drive is not going bad, then it sounds definitely like this is a bad motherboard. It doesn’t matter what the manufacturer says about their burn in test, they aren’t testing it the same as you are. However it could be the CPU or RAM or another component plugged in to the motherboard like a video card - in addition to the Seagate drive going bad. This type of problem takes methodical testing and the process of elimination - all assumptions are “wrong” until proven right. Commented Nov 23, 2017 at 13:48
  • @Appleoddity - the video runs on the on-die iGPU. The restarts have occurred at my dealer's premises with only his HDD connected, so the motherboard or the CPU/RAM on it look to be the culprit. Re: Southbridge, can the manufacturer specifically test that? At this stage, it looks like if I can't get them to reproduce, I'll be left holding a defective board despite it being within the warranty period.
    – Gyan
    Commented Nov 23, 2017 at 13:58
  • 1
    I would think that this should be done in two steps : (1) Hand over your CPU+RAM and see if they can duplicate the problem and fix it. (2) If the problem cannot be analyzed, you should ask for a replacement for the whole thing, motherboard+CPU+RAM.
    – harrymc
    Commented Nov 25, 2017 at 20:08
  • Did you check your CPU fan? Sometimes if it gets damaged, it will stop and run again or even lose the appropriate speed, this can cause irregular restarts depends on the STOP duration, or the load of CPU. Other problems could be irrelevant to this. And you may need to check every thing ( even HDD controller & PSU!)
    – Omid PD
    Commented Nov 26, 2017 at 8:51
  • Unlikely to be a case. Could be anything else that is used regularly - PSU, motherboard, cables, perhaps CPU. Sometimes, unfortunately, these scenarios can simply be very difficult to correctly diagnose, and adding a bounty to incentive answerers doesn't change that difficulty. Perhaps the only likely working strategy is to keep replacing until every single thing has been replaced. The only major, significant conclusive and useful finding that I see discovered so far is that it isn't the operating system, and that's concluded just because the reboot happened "once while in the BIOS".
    – TOOGAM
    Commented Nov 30, 2017 at 12:24

3 Answers 3

1

The main problem with service centers is that they will not look for your problem, until you specify them or they are really obvious. So you will have to figure out yourself what is the exact problem and then demand from SC to solve it. Below are general suggestions.

The "usual" suspects for restarts are:

  1. overheating
  2. power supply problems
  3. bad drivers
  4. memory

Before you start, examine your motherboard. Look at capacitors near power plug, make sure all extension cards connections are ok, move them a little bit.

How to test:

  1. run a system diagnostics, look for temperature values. Run some system intensive test (cpu, memory, i/o intensive). Blowing the dust off the cooler helps greatly ;)
  2. plug in UPS. Your home power fluctuations may influence badly your system power supply unit, especially if it is running above 80% of declared consumption value.
  3. run different OS, like Ubuntu live CD or even Windows Live CD (better USB version)
  4. run memory tests. It takes like forever, but you get clear result after that. If some problem found, you can try to change memory slots.

In your case I suspect something wrong with HD controller. It may be a driver problem, it may be hardware problem. Try to update all the drivers, especially those related to system and storage. Google if anyone with same MOBO had similar problems. If it doesn't help, try to remove one of the HDs, and see if you see an improvement. It might be HD itself, but this is not very likely. You may also try to flash new "BIOS" version

1

Based on your description, it sounds like it could be a power supply issue-- what type of power supply do you have (Wattage is most helpful here). What all do you have running in the system hardware wise? Perhaps there is some over drawing of power. Of course, there could be two issues here-- one with the reboots and one with the Hard drive, too.

0

The possibilities for the problem are simply endless. The burn-in test is not conclusive, as it tests one environment and does not reproduce every possible case. It might be possible that the problem never arrives when the computer is warm, so just happening when it starts up, which this test does not cover.

I would advise you to hand over your CPU+RAM and see if the Gigabyte service Center can duplicate the problem and then fix it.

If the problem cannot be analyzed because it is outside of the scope of their tests, you should ask Gigabyte to replace the whole thing, meaning motherboard and CPU and RAM, thus exchanging the computer for a new one. The warranty should be applicable in this case.

2
  • I don't think a motherboard manufacturer replaces CPU and RAM out of their own pocket, unless the motherboard can be proven to have destroyed them.
    – Peter
    Commented Nov 30, 2017 at 9:01
  • @Peter: I'm talking in effect of exchanging the computer. I clarified it in my answer.
    – harrymc
    Commented Nov 30, 2017 at 9:02

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .