5

We have an unusual problem where some of our Linux servers are going into a state where the CPU utilisation is very high and when we dig down, most of it is in kernel space. The %sys utilisation is 80% or so. So far, we have not found any specific way to debug and any help would be appreciated.

"perf top" shows native_queued_spin_lock_slowpath as the major culprit (90%).

Below is a brief snapshot of sar. enter image description here

0

1 Answer 1

2

You are quite possibly exceeding the ability of the hardware to do its work. There are too many CPUs simultaneously trying to queue and to handle work - more than the physical hardware can actually do.

Your very multi-threaded software is spending its time waiting for ... itself. These shared resources might be shared-memory, shared server, or even the disk.

Native_queued_spin_lock_slowpath is a spin-lock. Such a lock should "spin" only briefly and only occasionally, but yours are doing it a lot. CPU time spent in "spinning" is time 100% wasted.

You only need to dedicate enough CPUs to handle the task. You accomplish nothing of value by dedicating more CPUs to the task such that they merely wait for one another, especially given that they are literally wasting CPU time in a spin-lock when they should be doing something useful.

You should reduce the number of CPUs that you use. You can use "affinity" rules to distribute computing resources among CPUs if you really need such performance.

See also the post Why having more and faster cores makes my multithreaded software slower?

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .