0

CPUs have the feature to dynamically clock-down according to their temperature to avoid overheating. At work I have two servers, one of which shows some bad behaviour (random reboots).

The following snippet further down is something I see in the system logs of both machines. Is this a consequence of the normal operation of the CPU's dynamic frequency scaling, or is this an indication of some error (e.g. bad application of heat paste)?

I would expect, that something as mundane as the dynamic frequency scaling of a modern CPU would not show up in the system logs.

As a side note: no over-clocking has been done or attempted at any point in the servers' time with us.

The kernel log indicates that hardware errors were detected.
System log may have more information.
The last 20 mcelog lines of system log are:
==========================================
Jan 31 17:13:12 apollo3 mcelog: Family 6 Model 4f CPU: only decoding architectural errors
Feb  2 15:07:50 apollo3 mcelog: Family 6 Model 4f CPU: only decoding architectural errors
Feb  2 15:07:50 apollo3 mcelog: Hardware event. This is not a software error.
Feb  2 15:07:50 apollo3 mcelog: MCE 0
Feb  2 15:07:50 apollo3 mcelog: CPU 1 THERMAL EVENT TSC 15900247053fc
Feb  2 15:07:50 apollo3 mcelog: TIME 1486044329 Thu Feb  2 15:05:29 2017
Feb  2 15:07:50 apollo3 mcelog: Processor 1 heated above trip temperature. Throttling enabled.
Feb  2 15:07:50 apollo3 mcelog: Please check your system cooling. Performance will be impacted
Feb  2 15:07:50 apollo3 mcelog: STATUS 88000bcb MCGSTATUS 0
Feb  2 15:07:50 apollo3 mcelog: MCGCAP 7000c16 APICID 4 SOCKETID 0
Feb  2 15:07:50 apollo3 mcelog: CPUID Vendor Intel Family 6 Model 79
Feb  2 15:07:50 apollo3 mcelog: Family 6 Model 4f CPU: only decoding architectural errors
Feb  2 15:07:50 apollo3 mcelog: Hardware event. This is not a software error.
Feb  2 15:07:50 apollo3 mcelog: MCE 1
Feb  2 15:07:50 apollo3 mcelog: CPU 1 THERMAL EVENT TSC 15900247241ad
Feb  2 15:07:50 apollo3 mcelog: TIME 1486044329 Thu Feb  2 15:05:29 2017
Feb  2 15:07:50 apollo3 mcelog: Processor 1 below trip temperature. Throttling disabled
Feb  2 15:07:50 apollo3 mcelog: STATUS 88010a8a MCGSTATUS 0
Feb  2 15:07:50 apollo3 mcelog: MCGCAP 7000c16 APICID 4 SOCKETID 0
Feb  2 15:07:50 apollo3 mcelog: CPUID Vendor Intel Family 6 Model 79
1
  • It's pretty explicitly telling you that there is an error condition. Also just look at the documentation for that log?
    – Seth
    Commented Feb 3, 2017 at 13:05

1 Answer 1

0

As it says - CPU is overheating.

  1. Clean and check all the fans if they're working correctly

  2. Switch the heat paste (or if it's still under warranty, go to the C)

  3. Contact the manufacturer if the problem still occurs

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .