11

I have found that when running into an out-of-memory OOM situation, my linux box UI freezes completely for a very long time.

I have setup the magic-sysrq-key then using echo 1 | tee /proc/sys/kernel/sysrq and encountering a OOM->UI-unresponsive situation was able to press Alt-Sysrq-f which as dmesg log showed causes the OOM to terminate/kill a process and by this resolve the OOM situation.

My question is now. Why does linux become unresponsive as much as the GUI froze, however did seem not to trigger the same OOM-Killer, which I did trigger manually via Alt-Sysrq-f key combination?

Considering that in the OOM "frozen" situation the system is so unresponsive as to not even allow a timely (< 10sec) response to hitting the Ctrl-Alt-F3(switch to tty3), I would have to assume the kernel must be aware its unresponsiveness, but still did not by itself invoke the Alt-Sysrq-f OOM-Killer , why?

These are some settings that might have an impact on the described behaviour.

$> mount | grep memory
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
$> cat /sys/fs/cgroup/memory/memory.oom_control 
oom_kill_disable 0
under_oom 0
oom_kill 0

which while as I understand states that the memory cgroup does not have OOM neithe activated nor disabled (evidently there must be a good reason to have the OOM_kill active and disabled, or maybe I cannot interpret correctly the output, also the under_oom 0 is somewhat unclear, still)

2 Answers 2

6

The reason the OOM-killer is not automatically called is, because the system, albeit completely slowed down and unresponsive already when close to out-of-memoryy, has not actually reached the out-of-memory situation.

Oversimplified the almost full ram contains 3 type of data:

  1. kernel data, that is essential
  2. pages of essential process data (e.g. any data the process created in ram only)
  3. pages of non-essential process data (e.g. data such as the code of executables, for which there is a copy on disk/ in the filesystem, and which while being currently mapped to memory could be "reread" from disk upon usage)

In a memory starved situation the linux kernel as far as I can tell it is kswapd0 kernel thread, to prevent data loss and functionality loss, cannot throw away 1. and 2. , but is at liberty to at least temporarily remove those mapped-into-memory-files data from ram that is form processes that are not currently running.

While this is behaviour which involves disk-thrashing, to constantly throw away data and reread it from disk, can be seen as helpful as it avoids, or at least postpones the necessariry removing/killing of a process and the freeing-but-also-loosing of its memory, it has a high price: performance.

[load pages from disk to ram with code of executable of process 1]
[ run process 1 ] 
[evict pages with binary of process 1 from ram]
[load pages from disk to ram with code of executable of process 2]
[ run process 2 ] 
[evict pages with binary of process 2 from ram]
[load pages from disk to ram with code of executable of process 3]
[ run process 3 ] 
[evict pages with binary of process 3 from ram]
....
[load pages from disk to ram with code of executable of process 1]
[ run process 1 ] 
[evict pages with binary of process 1 from ram]

is clearly IO expensive and the system is likely to become unresponsive, event though technically it has not yet run out completely of memory.

From a user persepective however it seems, to be hung/frozen and the resulting unresponsive UI might not be really preferable, over simply killing the process (e.g. of a browser tab, whose memory usage might have very well been the root cause/culprit to begin with.)

This is where as the question indicated the Magic SysRq key trigger to start the OOM manually seems great, as the Magic SysRq is less impacted by the unresponsiveness of the system.

While there might be use-cases where it is important to preserve the processes at all (performance) costs, for a desktop, it is likely that uses would prefere the OOM-killer over the frozen UI. There is patch that claims to exempt clean mapped fs backed files from memory in such situation in this answer on stackoverflow.

2

You could watch the file /sys/fs/cgroup/memory/memory.oom_control, durring a stress test.

or

You could look at it's last modified date, to see if it was changed around the time of the last lockup. This will tell you if it was even atempting to do it's job.

under_oom 0

That is your issue:

under_oom    0 or 1 (if 1, the memory cgroup is under OOM, tasks may
             be stopped.)

If set to 1, it means it's under oom control. Enabled.
If set to 0, like that it's not under oom control. Disabled.

8
  • according to this documentation it seems that the under_oom is additional information: "The memory.oom_control file also reports the OOM status of the current cgroup under the under_oom entry. If the cgroup is out of memory and tasks in it are paused, the under_oom entry reports the value 1". As to your answer, it seems under_oom is clearly nothing to be set at all, but onlyindicating if the cgroups is in OOM condition (which it was not when I cat the oom_control file) Commented Nov 23, 2018 at 16:16
  • lwn.net/Articles/432224 Seems that is the case. Commented Nov 23, 2018 at 16:25
  • Thank you, the resource you linked is basically /usr/src/linux/Documentation/cgroup-v1/memory.txt and states that under_oom is as I wanted to hint at in the former comment, an indicator if an out-of-memory situation has happened in the memory control group. I think when you say "If set to 0, it means it's not under oom control. Disabled", then this might not really be correct, as indeed it may very well be under out-of-memory control, as the under_oom does indicate only if a out-of-memory situation exists, not if control is setup. The documentaiton is not ideal here. Commented Nov 23, 2018 at 16:52
  • Whith under_oom not being anything of a setting*/*configuration, I fear it does not provide me with insight regarding the question "why is the OOM not triggered automatically"? Commented Nov 23, 2018 at 16:54
  • In my previous comment I agreed with you, by saying "That seems to be the case." I did the digging my self and saw that was, the way it is. Commented Nov 23, 2018 at 16:57

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .