Why does OOM-killer sometimes fail to kill resource hogs?

Question

If I type in my shell x=`yes`, eventually I will get cannot allocate 18446744071562067968 bytes (4295032832 bytes allocated) because yes tries to write into x forever until it runs out of memory. I get a message cannot allocate <memory> because the kernel's OOM-killer told xrealloc there are no more bytes to allocate, and that it should exit immediately.

But, when I ask any game_engine to allocate more graphics memory that does not exist because I have insufficient resources, it turns to my RAM and CPU to allocate the requested memory there instead.

Why doesn't the kernel's OOM-killer ever catch any game_engine trying to allocate tons of memory, like it does with x=`yes`?

That is, if I'm running game_engine and my user hasn't spawned any new processes since memory-hog game_engine, why does said game_engine always succeed in bringing my system to its unresponsive, unrecoverable knees without OOM-killer killing it?

I use game engines as an example because they tend to allocate tons and tons of memory on my poor little integrated card, but this seems to happen with many resource-intensive X processes.

Are there cases under which the OOM-killer is ineffective or not able to revoke a process' memory?

James K. Lowden · Accepted Answer · 2016-02-18 18:55:36Z

2

Really, the best solution for the OOM killer is not to have one. Configure your system not to use overcommitted memory, and refuse to use applications and libraries that depend on it. In this day of infinite disk, why not supply infinite swap? No need to commit to swap unless the memory is used, right?

The answer to your question may be that the OOM killer doesn't work the way you think it does. The OOM killer uses heuristics to choose which process to kill, and the rules don't always mean that the last requestor dies. Cf. Taming the OOM killer. So it's not a question of the OOM killer being "ineffective", but rather one of it making a choice other than the one you'd prefer.

answered Feb 18, 2016 at 18:55

James K. Lowden

2,10014 silver badges15 bronze badges

4

Have you considered what would happen if you combine terabytes of swap with a runaway process? You'd be looking at days of swapping as you tried to get things back under control. Disk random access times have not kept up with the increase in capacity -- average seek times are tied to RPM, and that hasn't changed in fifteen years.
– Mark
Commented May 10, 2016 at 4:06
@Mark, that's a straw-man argument on two levels. First, lots of OSes do and have run without an OOM, and they don't suffer from days of downtime. Second, I'm sure you've heard of ulimit? We've had ways of controlling process size for 30 years. Instead of running every process with unlimited virtual memory and waiting for the OOM to drop, why not limit each one to 100 GB or so? Then your days are seconds, and your system is deterministic again.
– James K. Lowden
Commented May 10, 2016 at 4:27
ulimit does not work for generic case because it only limits memory per process. The system can be still taken down by misbehaving process that does both fork and comsume lots of memory. One needs to use cgroups to really force system to never go down because of misbehaving processes or to rely OOM killer. OOM killer should require zero configuration so most people try to deal with that. In pretty much all cases you're still hosed. If cgroups or OOM killer needs to kill your misbehaving but important process, how good is your system after that process has been killed?
– Mikko Rantalainen
Commented Feb 3, 2018 at 12:02

Add a comment |

Stack Exchange Network

Why does OOM-killer sometimes fail to kill resource hogs?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
memory
out-of-memory
.

Linked

Hot Network Questions

Why does OOM-killer sometimes fail to kill resource hogs?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged memoryout-of-memory.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
memory
out-of-memory
.