3

I'm running a virtual machine at a provider where you pay for everything you use (CPU, memory, disk IO, bandwidth, etc). Currently my machine has 512MB ram and 1GB swap area.

The kernel version is 3.8.0-21 and the distribution is Ubuntu 13.04 but I suspect the kernel may be custom made, because kernel updates are held back.

I run a few cron jobs where Python processes are doing things with a PostgreSQL database. Lately the Python processes have been killed by the OOM killer.

I don't need help in solving the actual problem, I can increase the memory, reschedule the cron jobs, turn off overcommitting of memory, tune the PostgreŚQL settings, change how the Python programs work, but first I would like to know exactly why this happens (to make the correct fix). The thing is that I have plenty of swap free at the time of the kill but no physical memory.

I have the following in the kernel log:

[4003981.650341] select 1 (init), adj 0, size 249, to kill
[4003981.650349] select 609 (rsyslogd), adj 0, size 609, to kill
[4003981.650359] select 17139 (postgres), adj 0, size 635, to kill
[4003981.650361] select 10381 (postgres), adj 0, size 6719, to kill
[4003981.650365] select 14153 (postgres), adj 0, size 7296, to kill
[4003981.650367] select 14159 (postgres), adj 0, size 7300, to kill
[4003981.650370] select 26802 (python3), adj 0, size 70767, to kill
[4003981.650372] send sigkill to 26802 (python3), adj 0, size 70767

I run wmstat every second around the same time (the Python process was killed right before 12:13:48):

2014-02-01 12:13:43 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
2014-02-01 12:13:43  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa
2014-02-01 12:13:43  1  0     55      5    216    217    0    0   964   396  386  514 71 10  0 18
2014-02-01 12:13:44  1  1     55     87    185    166    0    0 22304 14536  907 1241 53  9  0 38
2014-02-01 12:13:45  1  0     55     60    190    189    0    0 21768 17344 1216 4581 21 26  0 53
2014-02-01 12:13:46  1  1     57      6    218    214    0    0 22264  4836 1031 3696 22 43  0 35
2014-02-01 12:13:47  2  1     59      4    217    218    0    0 28228 29892 1045 6234 22 34  0 44
2014-02-01 12:13:48  1  2     73    272     97     74    0    0 39436 14372  975 3708 10 38  0 52
2014-02-01 12:13:49  1  0     73    185    173     85    0    0 78400   356 1154 1943 23 33  0 44
2014-02-01 12:13:51  1  1     73    247    132     65    0    0  1936     0  165  188  7 13 69 11

Now to the actual question: I have seen other questions here (like this), where there is a log with more details about the memory (DMA, Normal memory, etc). I can't seem to find it anywhere on my system. It is something that needs to be turned on or where can I find it?

The information I'm looking for looks something like this:

Free pages:        6524kB (0kB HighMem)
Active:20 inactive:23 dirty:0 writeback:0 unstable:0 free:1631 slab:874 mapped:20 pagetables:90
DMA free:6524kB min:1200kB low:1500kB high:1800kB active:80kB inactive:92kB present:16384kB pages_scanned:41 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 411*4kB 252*8kB 113*16kB 27*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6524kB
Normal: empty
HighMem: empty
Swap cache: add 24282, delete 24269, find 7705/11790, race 0+0
Free swap  = 124768kB
Total swap = 128448kB
Out of Memory: Killed process 453 (smbd).
3
  • 1
    In every x86 linux system I've used, the oom-killer logs DMA and Normal memory counts etc. to the kernel log. In the ubuntu 13.10 linux kernel source I looked`at, the strings in your sample output above, "select.*size.*to kill", occur only in the android driver, in lowmemorykiller.c. They don't occur at all in mm/oom_kill.c. Could you tell us a bit more about the architecture and hypervisor of the virtual machine you're running in? Commented Feb 1, 2014 at 22:35
  • @MarkPlotnick I do not know what architecture and hypervisor they use. I don't know how to see that from within the guest instance, maybe I could email their support and ask for it.
    – ygram
    Commented Feb 3, 2014 at 14:45
  • 1
    Now dstat provides the feature to find out in your running system which process is candidate for getting killed by oom mechanism dstat --top-oom --out-of-memory--- kill score java 77 java 77 java 77 stackoverflow.com/a/25622401/4268594
    – jamlee
    Commented Sep 18, 2019 at 7:01

2 Answers 2

2

I'm not sure which log you're actually referring to but there is a logging system that's often included with most distros called sar, it's typicaly in a package called sysstat.

Also I wrote up this U&L Q&A that covers a variety of methods for logging performance information, titled: " Commands for determining level of usage of server"

Additional sar references

3
  • I added some example of the log I am interested in. I will also look into SAR and see if I can get the information there.
    – ygram
    Commented Feb 3, 2014 at 14:55
  • 1
    @ygram - look around the OOM project's site: linux-mm.org/OOM
    – slm
    Commented Feb 3, 2014 at 15:13
  • 1
    The function responsible for the output: "..The dump comes from mm/page_alloc.c show_free_areas() function...", spinics.net/lists/newbies/msg49751.html
    – slm
    Commented Feb 3, 2014 at 15:18
0

Let me first explain when and why OOMKiller get invoked?

In your case you have 512 RAM + 1GB Swap memory. So in theory, your CPU has access to total of 1.5GB of virtual memory.

Now, for some time everything is running fine within 1.5GB of total memory. But all of sudden (or gradually) your system has started consuming more and more memory and it reached at a point around 95% of total memory used.

Now say any process (in your case python) has requested large chunck of memory from the kernel. Kernel check for the available memory and find that there is no way it can allocate your process more memory. So it will try to free some memory calling/invoking OOMKiller (http://linux-mm.org/OOM).

OOMKiller has its own algorithm to score the rank for every process. Typically which process uses more memory becomes the victim (python in your first set of logs; and smbd in later section of logs).

Where can I find logs of OOMKiller?

Typically in /var/log directory. Either /var/log/kern.log or /var/log/dmesg

Some typical solutions:

  1. Increase memory (not swap)
  2. Find the memory leaks in your program and fix them
  3. Restrict memory any process can consume (for example JVM memory can be restricted using JAVA_OPTS)
  4. See the logs and google :)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .