25

When my Linux system gets close to paging (i.e., in my case, 16GB ram almost full, 16GB swap completely empty) if a new process X tries to allocate some memory the system completely locks. That is, until a disproportionate amount of pages, (wrt the total size and rate of X's memory allocation requests) have been swapped out. Notice that not only the gui becomes completely unresponsive but even basic services like sshd are completely blocked.

These are two pieces of code (admittedly crude) that I use to trigger this behavior in a more "scientific" way. The first one gets two numbers x,y from the command line and proceeds to allocate and initialize multiple chunks of y bytes until more than x total bytes have been allocated. And then just sleeps indefinitely. This will be used to bring the system on the brink of paging.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char** argv) {
   long int max = -1;
   int mb = 0;
   long int size = 0;
   long int total = 0;
   char* buffer;

   if(argc > 1)
     {
       max = atol(argv[1]);
       size = atol(argv[2]);
     }
   printf("Max: %lu bytes\n", max);
   while((buffer=malloc(size)) != NULL && total < max) {
       memset(buffer, 0, size);
       mb++;
       total=mb*size;
       printf("Allocated %lu bytes\n", total);       
   }      
   sleep(3000000);
   return 0;
}

The second piece of code does exactly what the first does except that it has a sleep(1); right after the printf (I'm not going to repeat the whole code). This one will be used when the system is on brink of paging to get it to swap pages out in a "gentle" way i.e. by slowly requesting the allocation of new chunks of memory (so that the system should certainly be able to swap out pages and keep up with the new requests).

So, with the two pieces of code compiled, let's call the respective exes fasteater and sloweater, let's do this:

1) start your favorite gui (not strictly necessary of course)

2) start some mem/swap meter (e.g. watch -n 1 free)

3) start multiple instances of fasteater x y where x is of the order of gigabytes and y is of the order of megabytes. Do it until you almost fill the ram.

4) start one instance of sloweater x y, again where x is of the order of gigabytes and y is of the order of megabytes.

After step 4) what should happen (and it always happens for my system) is that just after having exhausted the ram, the system will lock completely. gui is locked sshd is locked etc. BUT, not forever! After sloweater has finished its allocation requests the system will come back to life (after minutes of locking, not seconds...) with this situation:

a) ram is about full

b) swap is also about full (remember, it was empty at the beginning)

c) no oom killer intervention.

And notice that the swap partition is on an SSD. So, the system seems to be unable to gradually moving pages from ram to the swap (presumably from the fasteaters that are just sleeping) to make space for the slow (and of just few megabytes) requests of the sloweater.

Now, someone correct me if I'm wrong, but this does not seem the way a modern system should behave in this setting. It seems to behave like the old systems (waaaaay back) when there was no support for paging and the virtual memory system just swapped out the entire memory space of some process instead of few pages.

Can someone test this too? And maybe someone that also has a BSD system.

UPDATE 1 I followed the advice from Mark Plotnick below in the comments and I started vmstat 1 >out before proceeding with the paging test. You can see the result below (I cut the whole initial part where ram is filled without swap involvement):

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0   6144 160792      8 272868    0    0     0     0  281 1839  1  0 99  0  0
0  0   6144 177844      8 246096    0    0     0     0  425 2300  1  1 99  0  0
0  0   6144 168528      8 246112    0    0    16     0  293 1939  1  0 99  0  0
0  0   6144 158320      8 246116    0    0     0     0  261 1245  0  0 100  0  0
2  0  10752 161624      8 229024    0 4820 17148  4820  845 3656  1  2 97  0  0
2  0  10752 157300      8 228096    0    0 88348     0 2114 8902  0  5 94  1  0
0  0  10752 176108      8 200052    0    0 108312     0 2466 9772  1  5 91  3  0
0  0  10752 170040      8 196780    0    0 17380     0  507 1895  0  1 99  0  0
0 10  10752 160436      8 191244    0    0 346872    20 4184 17274  1  9 64 26  0
0 29 12033856 152888      8 116696 5992 15916880 1074132 15925816 819374 2473643  0 94  0  6  0
3 21 12031552 295644      8 136536 1188    0 11348     0 1362 3913  0  1 10 89  0
0 11 12030528 394072      8 151000 2016    0 17304     0  907 2867  0  1 13 86  0
0 11 12030016 485252      8 158528  708    0  7472     0  566 1680  0  1 23 77  0
0 11 12029248 605820      8 159608  900    0  2024     0  371 1289  0  0 31 69  0
0 11 12028992 725344      8 160472 1076    0  1204     0  387 1381  0  1 33 66  0
0 12 12028480 842276      8 162056  724    0  3112     0  357 1142  0  1 38 61  0
0 13 12027968 937828      8 162652  776    0  1312     0  363 1191  0  1 31 68  0
0  9 12027456 1085672      8 163260  656    0  1520     0  439 1497  0  0 30 69  0
0 10 12027200 1207624      8 163684  728    0   992     0  411 1268  0  0 42 58  0
0  9 12026688 1331492      8 164740  600    0  1732     0  392 1203  0  0 36 64  0
0  9 12026432 1458312      8 166020  628    0  1644     0  366 1176  0  0 33 66  0

As you can see, as soon as the swap gets involved there is a massive swapout of 15916880 Kbytes all at once which, I guess, lasts for the whole duration of the system freeze. And all of this is apparently caused by a process (the sloweater) that just asks for 10MB every second.

UPDATE 2: I did a quick installation of FreeBSD and repeated the same allocation scheme used with Linux...and it was as smooth as it should be. FreeBSD swapped pages gradually while the sloweater allocated all its 10MB chunks of memory. Not one hitch of any kind...WTF is going on here?!

UPDATE 3: I filed a bug with the kernel bugtracker. It seems to be getting some attention so...fingers crossed...

12
  • 2
    As I mentioned, everything is locked. I tried ssh'ing from another system it just times out. Commented May 3, 2018 at 16:59
  • 2
    If I start vmstat 1 with stdout output, I think it's going to freeze. But you're right, I could just start vmstat 1>somefile directly from the system and then see what it reports after the system has come back to life. I'll try that. Commented May 4, 2018 at 11:22
  • 2
    I used vmstat. Results in the update above. Commented May 4, 2018 at 13:05
  • 3
    swappiness is the default 60 (not that changing it gives a better result). The kernel used with the vmstat run is 4.14.35 but I've tried 4.15, 4.16 and I've even gone back to the 4.0 series (!): always the same behavior. And it's not that I'm using some weird distribution, it's just debian. I don't use the kernel images from debian (not that mine have unusual configs) but I've tried one of those...same behavior. Commented May 4, 2018 at 14:59
  • 2
    Very interesting discussion on the kernel bug! And it looks like you isolated this problem to swap partition encrypted with LUKS. You might want to edit your answer or possibly post an answer yourself (with the workarounds known so far, and perhaps keep updating it as the LKML discussion gets to more conclusive results.) Really impressive to see the Linux kernel community at work! 😁
    – filbranden
    Commented Oct 12, 2018 at 20:34

2 Answers 2

1

This is exactly what thrash-protect exists for.

It constantly monitors the swapping state and, when something accidentally starts occupying a lot of RAM, temporarily freezes RAM-greedy processes, so the kernel has time for swapping some memory out without making the whole system unresponsive.

2
  • Sorry but no. That's not a greedy process problem. There is no greedy process. There's only kswapd behaving very badly with no apparent reason, as it was recognized in the discussion in the bugzilla entry I filed. Commented Mar 14, 2020 at 16:54
  • @JohnTerragon great then. I've read the thread, your case seems to be a too configuration-specific issue, which is going to be fixed. Though it's still possible that thrash-protect relieves the situation, due to it's problem-agnostic nature. If the kernel interrupts swapping the whole 15 GB when the swapped process is stopped, you may get a series of short freezes, Instead of a prolonged freeze: not ideal, but still better for the user experience, as this gives a chance to resolve it manually (f.e., manually kill the eater or something less important).
    – bodqhrohro
    Commented Mar 15, 2020 at 20:06
-3

You are only allocating memory - you don't actually put anything in it. A "normal" program would allocate a chunk and then start using it. Allocation is not the same as memory usage.

2
  • 3
    Welcome to posting on Unix StackExchange. It does put data in it, that data just happens to be zero. See the memset(). Linux kernel provides a physical page of RAM as soon as you write to the virtual page; it does not look at the specific value which is written.
    – sourcejedi
    Commented May 23, 2018 at 13:50
  • Actually, I compiled & ran this on my desktop starting with 2GB used, 6GB free. It actually swapped out at a slow rate initially and only when it hit the limit did it swap out aggressively - which then caused various GUI actions to seize up. Commented May 25, 2018 at 20:04

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .