Revisions to OOM killer doesn't work properly, leads to a frozen OS

removing Edits as requested in the comments by dev. Kusalananda

Source Link

edited Jan 17, 2019 at 12:14

user306023

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile ~~and I'm looking into the relevant GFP flags...~~

EDIT4UPDATE: I found aThe only way, I've found thus far is through patching the kernel, thatand it works for me;me with swap disabled(ie. CONFIG_SWAP is not set) but doesn't work for others with swap enabled it seems; see the patch inside this question.

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile ~~and I'm looking into the relevant GFP flags...~~
EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!

UPDATE: The only way I've found thus far is through patching the kernel, and it works for me with swap disabled(ie. CONFIG_SWAP is not set) but doesn't work for others with swap enabled it seems; see the patch inside this question.

found a preliminary working way to avoid OS freezing via disk thrashing when about to run out of RAM

Source Link

edited Aug 29, 2018 at 11:16

user306023

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into~~and I'm looking into the relevant GFP flags...~~
EDIT4: I found a way, through patching kernel, that works for me; see the relevantpatch inside GFP flags..this question.

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile ~~and I'm looking into the relevant GFP flags...~~
EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.

realization that that was a bad mitigation

Source Link

edited Aug 26, 2018 at 17:28

user306023

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process, and thus in effect causing the disk thrashing(constant reading):

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT:EDIT3 I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even thoughignore these 2 EDITs, they are not the OOM-killer doesn't get a changeway to trigger, which makes sensefix this because it only triggered well after the disk-thrashing anywayprocesses that wouldn't have died before die now sooner, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes EDIT2 ok the above worked for me with the defaultalso vm.overcommit_ratio=50overcommit_ratio=0 but if I set itwill cause everything to 200 thendie, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the disk-thrashing is back!relevant GFP flags...

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process, and thus in effect causing the disk thrashing(constant reading):

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!
EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...

solution is likely just an edge case - unlikely to work all the time

Source Link

edited Aug 26, 2018 at 9:44

user306023

Loading

one potential solution found

Source Link

edited Aug 26, 2018 at 9:39

user306023

Loading

it's not kswapd0 that does the disk reading, it may be the indirect trigger though; it's the executables themselves(according to iotop)

Source Link

edited Aug 19, 2018 at 11:15

user306023

Loading

Source Link

created Aug 19, 2018 at 10:23

user306023

Loading

Stack Exchange Network

Return to Answer