30

I have a cloud server with ~14G of RAM and no swap. However, I occasionally see kswapd0 taking up some CPU when I run top. Why would kswapd0 be running at all if there's no swap space for it to manage?

6 Answers 6

36

Swap space is only used for data that is not backed by any other file. Data that is mapped from other files on disk ( such as executable programs ) is still swapped to their respective files even if you don't have a swap device.

1
  • 19
    For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. Commented Feb 15, 2018 at 13:08
23

It is a well known problem that when Linux run out of memory it can enter swap loops instead of doing what it should be doing, killing processes to free up ram. There are an OOM (Out of Memory) killer that does this but only if Swap and RAM are full.

However this should not really be a problem. If there are a bunch of offending processes, for example Firefox and Chrome, each with tabs that are both using and grabbing memory, then these processes will cause swap read back. Linux then enters a loop where the same memory are being moved back and forth between memory and hard drive. This in turn cause priority inversion where swapping a few processes back and forth makes the system unresponsive.

If you disable swap you make this problem worse as kswapd0 now have no option than to swap out mapped memory such as executables. If you swap out executables it is even more likely that they will be swapped back in again rather quickly.

I tried triggering this behavior in NetBSD for testing and what happened there is that the the offending process became incredible slow while the OS itself was very responsive. Meaning that the swapping problem do occur but there are no priority inversion. However NetBSD doesn't have AMDGPU drivers so I am sticking to Linux for time being. Perhaps NetBSD doesn't memory map executables and that is why it doesn't enter swap loops but I don't really know enough about it's implementation to say why it doesn't become unresponsive.

Facebook had this problem as well and created the OOMD which is the Out Of Memory Daemon. This is daemon that detects kswapd0 activity and starts killing processes. And according to Facebook this almost entirely removed the problem of Linux servers becoming unresponsive. However I have not tested it and I don't know how well it will work on other servers or desktop/laptops. Appealingly OOMD has some logic deciding what processes to kill first in order to preserve system processes and the part of their server system that are responsible for relaunching whatever was killed.

However this is not how it should be solved. OOMD is an UGLY HACK. The real solution is to fix the priority inversion that a swap loop causes as well as making the kernel OOM Killer more aggressive at killing processes to free memory. The fix belong in the kernel because that's the only place where it we can be sure that the problem is detected in time and processes are being properly killed.

Setting swappiness=0 is no solution because when the system is out of free RAM it starts swapping no matter what. There are no option to guarantee that the system doesn't start swapping.

And also fixing the offending applications is not a fix. Expecially not if a user wants to exploit this bug in order to intentionally make the OS unresponsive. To be responsive is the responsibility of the kernel. If Firefox makes itself unresponsive then the fix is to the application. However it is not only making itself unresponsive but causing the entire OS to become very slow and unresponsive. To the level that it can take half an hour to log in to SSH. The SSH has nothing to do with and if it doesn't get to run that is a bug in the kernel, not in any other part of the system. And it is not a bug it is two bugs. One bug is priority inversion where a off the rails swapping cycle is allowed to interfere with other processes than the offending process(es) and that in itself is bad. The other bug is that it doesn't detect that it is in a swap loop and that causes insane wear on the HDD/SSD or whatever storage that is backing the swap. When swapping executable this is less of a problem as they are read only memory maps which aren't written back to disks but kswapd0 still gets locked reading back what it at the same time are deleting from memory.

Oh and there is a third bug. The fact that there are no way to protect disk CACHE from being eaten when memory hungry applications swallow all available memory. This is one of the reasons that kswapd0 makes the system unresponsive. The most hot memory mapped data are usually stored in the disk cache but when firefox has eaten that cache, well it obviously means that disk reads will have to occur.

It's not necessarily Firefox that are causing your problem but it is the default browser, not Chrome. And both are widly known to trigger this problem as they treat available memory as something that is wasted, including cache and swap memory which in Linux counts as "available memory". So in order not to get "available memory" get's wasted it use it for caching and other stuff. Obviously using SWAP for DISK CACHE is a VERY BAD IDEA but the fellows at both Firefox and Chrome respond to that with "free memory is wasted memory".

So what we have here is three kernel bugs that the kernel team do not seem to consider bugs. And a bug in Firefox, Chrome and all derivatives that they do not consider a bug. I tried building Firefox on my Fedora laptop in order to look into this problem and perhaps patch it. Guess what. Building Firefox with GCC on a 4 core CPU with 4GB ram triggers a SWAP LOOP with PRIORITY INVERSION. So one of the applications that has to be rewritten is GCC. On NetBSD what happens is just the 4 running instances of GCC gets slower than running one instance but it doesn't freeze the system.

Yeah this is a bit of a rant but I hope that it clarifies the current problem with the Linux memory subsystems as well as the applications that cause it.

4
  • is there a workaround to ensure ssh remains responsive?
    – jiggunjer
    Commented Jun 3, 2020 at 3:00
  • No. There is the possibility to limit ram usage with chroups. And there is the early OOM killer that can catch many cases of memory overuse before they occur. But the only way that I know to ensure it is to run NetBSD. Commented Jun 6, 2020 at 7:59
  • @jiggunjer you probably could increase the ssh timeout, like if the system's thrashing the disk it could take a couple of minutes for the ssh login to get cpu time, so in some cases maybe you could increase the ssh timeout and it might work in some cases, depends how long the latency is.
    – Owl
    Commented Jun 24, 2020 at 17:19
  • 1
    @Owl thanks, turned out my kswapd0 was crypto miner.
    – jiggunjer
    Commented Jun 25, 2020 at 6:39
10

It still has a process to check if there's any swap. To reduce it, you'll need to set your swappiness -

edit "/etc/sysctl.conf" as root, then change (or add)

vm.swappiness = 0
3
  • 4
    Ok, but why is it using 1% of my cpu?
    – benathon
    Commented Jun 10, 2015 at 23:00
  • 4
    if kswapd0 is taking any CPU and you do not have swap, the system is nearly out of RAM and is trying to deal with the situation by (in practise) swapping pages from executables. The correct fix is to reduce workload, add swap or (preferably) install more RAM. Adding swap will improve performance because kernel will have more options about what to swap to disk. Without swap the kernel is practically forced to swap application code. Commented Feb 15, 2018 at 13:12
  • 1
    If you have swap enabled and kswapd0 is using some CPU and you do not want that, lower the swappiness setting. However, unless your swap is backed by SSD that suffers from writing (e.g. bad wear leveling algorithm), lowering the swappiness reduces the system overall performance. The idea is to keep a copy of RAM in the swap in case more RAM is needed - in that case the copy in RAM is thrown away immediately instead of starting to swap that out before the RAM can be used. This optimistic swapping is only done while system is idle enough so it should never slow down your system. Commented Feb 15, 2018 at 13:15
4

Malware running as guest

If you have ever enabled Ubuntu's guest account and later enabled SSH, you may have malware running using your guest account.

sudo find /home -f kswapd0

You might find it under /home/guest/.configrc/

Many people are finding this question today on machines that require no swapping, and seeing that their available memory is normal, only to find out mining software has been installed and is running under the guest account, even automatically on startup.

One sign of this is that a single CPU core is being throttled to 100%, while disk activity is normal. In my case, the computer was running incredibly hot with an abnormal amount of outgoing network data.

Joining any network (conference, cafe, city) that has a compromised machine, or if you use a service like ngrok, all while having SSH open on your system leaves your computer exposed to this simple guest vulnerability.

An entry dedicated to this problem is here: CPU 100% with kswapd0 process, although no swap is needed

2

If you have no swap and kswapd0 is running, your system is actually using nearly all of the RAM at that moment. It's time to get better tools to monitor memory usage (or free/available memory in the system).

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again later if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux kernel to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it

The real fix is to reduce memory usage (run processes with less memory leaks, run less processes, skip running some processes at all, limit number of children/worker processes of some server software) or to get more RAM. If the need for RAM is caused by memory leaks, you may opt to use swap instead. Linux should be pretty smart getting the leaked parts to swap given enough time. Having swap is better than nothing but that is not a real substitute for having adequate amount of RAM.

3
  • There is good information here as well as in your comments, but enabling swap isn't a solution in the limit where all of the available memory (ram + swap) is getting filled. It's a particularly bad solution in the case of a memory leak, because it's inevitable that all memory will eventually become full. The result when swap+ram is full is the same as when ram is full and swap is disabled.
    – Codebling
    Commented Nov 4, 2019 at 2:17
  • There are no systems with infinite RAM which is what you would need to ensure system stability in the event of a memory leak. Eventually even the smallest leak will bring the system down. Commented Jun 6, 2020 at 8:02
  • Just to make things clear: adding swap to handle memory leaks is a WORKAROUND not a fix. Adding swap to hold the leaked memory is cheaper than getting RAM to hold the leaks and as long as the memory is actually leaked, it will never be read back from the swap. However, kernel cannot ever know if a single page is a leak or just seldomly used so you have to always store all pages and be prepared for slowdown if active page has been accidentally swapped instead of pure leaked page. Usually a better choice is to use memory cgroups and kill+restart leaking processes when the leak is too bad. Commented Jun 7, 2020 at 9:57
-2

Well, I had the issue, taking lots of RAM, no swap, and my laptop hangs completely. Killing the offending processes automatically did not seem to work, so now I made sure I had swap installed, to stabilize my machine. This seems to have worked. Like 20 years ago, this also worked in Linux, it just killed what didn't fit, and that was that, now I had to hard boot my machine every time it took up too much RAM, and that's really not what I had expected. At all.

2
  • When Linux runs out of RAM and there is no swap, the kernel will start reaping processes. If that process happens to somehow be critical to the operation of your system... well your system crashes. Commented Sep 10, 2020 at 8:22
  • @BrianTurek, it's not my observation. kswapd0 just went haywire, no processes seem to have been killed, reproducibly my system hung with lots of disk io kicking in (saw the hdd access light blinking). If the kernel would automatically kill crucial processes instead of userspace processes, it would also be a bug. I put my response here to give a pointer to anyone having this problem, too, and not knowing what's happening - hope anybody reads this, with -2 right now, the chances are bad.
    – Frischling
    Commented Sep 10, 2020 at 9:57

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .