Timeline for What can cause ALL services on a server to go down, yet still responding to ping? and how to figure out
Current License: CC BY-SA 3.0
13 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Oct 22, 2012 at 15:42 | comment | added | matteo | my server is at OVH, it's a dedicated server and they have a thing they call Manager on their site which remotely monitors your server, that's where I see the memory usage graph I'm talking about (which stopped collecting data when the server stalled, of course) and from where I (hard)rebooted it | |
Oct 22, 2012 at 15:36 | comment | added | matteo | @ewwhite if by "OOM issue" you mean "many processes like ssh, httpd and the like having been killed by OOM-killer", no, it is most surely not an OOM issue, but if by "OOM issue" you mean "my server ran out of memory", then I do have the absolute certainty that it was an OOM issue, because I saw the graph where there was a spike in RAM usage from normal levels to 100% and in swap from almost 0 to about 80% within a matter of minutes (an "instant" in the precision of the graph) | |
Oct 22, 2012 at 12:22 | comment | added | ewwhite | @matteo I see no indication that this is an OOM issue. Typically, the OOM-killer will pick specific or processes that meet certain criteria, but it wouldn't always kill a daemon like ssh. This is definitely on the I/O side. You didn't explain your hardware situation/specs as I requested in my answer. | |
Oct 22, 2012 at 8:33 | comment | added | matteo | So, I guess OOM-killer never kicked in at all. On one side I'm curious about why (why didn't it kill any process when there was a sudden peak of ram usage to 100% and swap to 80%), but that's just curiosity. On the other hand, is there something I could do so that, next time this happens, I will be able (after the crash and reboot) to figure out WHAT process ate up so much memory? | |
Oct 22, 2012 at 6:25 | comment | added | matteo | @JonathanCallen: thank you, but "grep -i kill /var/messages*" returned nothing. | |
Oct 21, 2012 at 21:56 | comment | added | Coops | @matteo - more details of finding the log entry here: stackoverflow.com/questions/624857/… | |
Oct 21, 2012 at 18:16 | comment | added | Jonathan Callen |
@matteo The log message would appear as Out of Memory: Killed process [PID] [process name]. , so greping for oom or killer wouldn't find it.
|
|
Oct 21, 2012 at 16:34 | comment | added | matteo | no sign of oom-killer in either dmesg nor messages, btw - at least I grepped (case insensitive) for both oom and killer | |
Oct 21, 2012 at 14:13 | comment | added | DerfK |
@matteo Linux has what it calls "overcommit": just because you malloc() 1GB of ram doesn't actually mean you're going to use it, so the memory manager keeps track of how much memory your program thinks it has and how much memory the program has actually used, and it actually works well, most of the time. At least, until more than one program actually wants to use all of the 1GB it thinks it has.
|
|
Oct 21, 2012 at 13:53 | comment | added | matteo | refuse to allocate memory to programs asking for it when there's not enough ram for the system to keep working correctly... I mean a buggy or even malicious program should never be able to destroy the whole system... | |
Oct 21, 2012 at 13:53 | comment | added | matteo | Thanks a lot, I'm almost sure this is the problem, as both the RAM and the swap were full prior to the server failure. (I can see on ovh's Manager's stats). And it's probably some of my crazy php scripts using a lot of memory. It does puzzle me however for a couple of reasons. (1) looks like the memory eaten up by php is not freed afterwards, but that wouldn't make sense; (2) in any case, I wouldn't expect a proper operating system to die completely just because of one (or even a few) processes using too much memory... I would expect it to | |
Oct 21, 2012 at 13:45 | vote | accept | matteo | ||
Oct 21, 2012 at 13:06 | history | answered | Coops | CC BY-SA 3.0 |