Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

11
  • I've never seen any mention of the OOM killer being triggered by slow swap, and certainly not witnessed it myself due to thrashing where waits for swap appear to exceed 1000ms. Do you have any citations for this behaviour? Commented Dec 9, 2022 at 13:09
  • @PhilipCouling : From linux-mm.org itself : Inside What causes these OOM events? / The kernel is not using its swap space properly / second paragraph. ("it is also possible for the system to find itself in a sort of deadlock…") linux-mm.org/OOM
    – MC68020
    Commented Dec 9, 2022 at 13:19
  • Thank you, this is a great explanation. I think your B.1 answers my second question. I'm amazed that Linux works this way - it seems like a truly stupid design to trigger an OOM event when there's still plenty of memory - but if it really does work by just waiting a fixed period for swapping to free enough memory then that would explain the behaviour I saw. I'd have to do more tests to conclusively rule out B.2 but it seems very unlikely, given the successful runs.
    – c--
    Commented Dec 9, 2022 at 13:33
  • I'm not convinced by B.1 as worded. The reference given in comment discusses an issue caused by deadlock not timing. The only way this involves timing is that you can attempt to avoid deadlock by trying to avoid exhausting available physical memory. To hit this situation the storage driver would need to request memory to complete the IO. In any case, it's the deadlock that causes it not the timing. Commented Dec 9, 2022 at 13:36
  • I don't think either your C.1 or C.2 answers my first question though. I can't see why either malloc returning NULL or the process receiving SIGBUS could cause other processes to die. Also, I think unhandled signals are usually logged by the kernel too these days, aren't they?
    – c--
    Commented Dec 9, 2022 at 13:39