How to find out, why a server hangs, but is still reachable with ping

Question

One of my servers, which runs in a german data center "hangs" every night, but i cant find out why. No errors are found in the /var/log/messages and /var/log/syslog.

The server responds to ping, but all services are down (ssh, apache, ...). After a reset everything runs normal.

A hardware test has been performed. It looks like being a software issue.

Is it possible to log into local console when the server "hangs"? Examine top output. May be some process just takes almost all CPU time and all network services just get connection timeout. — HUB, Commented Jun 20, 2011 at 16:27
Its possible to log into a local console, but a little bit complicated. This had been my next step. I now will log top output every minute like Eduardo suggested. — Martin Schlagnitweit, Commented Jun 20, 2011 at 18:10
It seems to be a kernel panic. At the local console i can see this. But into which logfiles should this be written? — Martin Schlagnitweit, Commented Jun 21, 2011 at 5:41
In the kern.log file there are some extries like this: Jun 25 14:05:39 solunic kernel: [369632.475072] php-cgi[15194] general protection ip:6914c9 sp:7fffaf0f84d0 error:0 in php5-cgi[400000+6f9000] Today the server crahsed again at 16:00 CET, but the last error in kern.log was one hour before this. — Martin Schlagnitweit, Commented Jun 25, 2011 at 15:27

Eduardo Ivanec · Accepted Answer · 2011-06-20 18:59:33Z

3

I'd leave some light profiling commands logging to files, so you can get an inside look on what went wrong after the fact. For example:

nohup top -b -d 60 >> top.log & # runs every 60 seconds
nohup vmstat 5 >> vmstat.log &
nohup iostat 5 >> iostat.log &

nohup is there so they aren't killed when you lose connection to the server. You can also use screen for that.

A more robust alternative to the last two commands would be to setup sar.

edited Jun 20, 2011 at 18:59

answered Jun 20, 2011 at 16:59

Eduardo Ivanec

15.1k2 gold badges38 silver badges43 bronze badges

I bet you meant to use ">>" instead of ">", right?
– mfinni
Commented Jun 20, 2011 at 17:08
It'd be better on some cases, but it's not necessary - each command keeps running and so opens/overwrites the output files only once at start. But I'll change it just in case, thanks!
– Eduardo Ivanec
Commented Jun 20, 2011 at 17:16
If those keep running after the "symptom" happens, then the output files will get over-written. That's why I'm suggesting the change.
– mfinni
Commented Jun 20, 2011 at 17:40
Oh wait - you mean the commands continue to run in the background, with the output redirected only once? Neat trick.
– mfinni
Commented Jun 20, 2011 at 17:46
Well, actually I meant the commands to be executed separately, and yes - they do keep running/cycling by themselves. But I'll add nohup and & because I wasn't that clear, thanks.
– Eduardo Ivanec
Commented Jun 20, 2011 at 18:58

Add a comment |

Matt Beckman · Accepted Answer · 2011-06-20 17:09:00Z

1

When I have seen issues like this, it usually ends up being a problem with a cron job.

Check your syslog for cron jobs running at the same time of day that the server hangs. Also, check your root crontab (crontab -e) and jobs in /etc/cron.daily for anything that might be responsible.

answered Jun 20, 2011 at 17:09

Matt Beckman

1,52219 silver badges33 bronze badges

Add a comment |

server_knuckle_dragger · Accepted Answer · 2011-06-20 18:34:40Z

-2

Sounds like the random crashing could be caused by to faulty hardware. Have the hosting company see if there are any errors on POST or on the servers LCD. If it is a dell server you might want to install open manage which will tell you if any hardware is faulty. In my experience, a faulty memory dimm can cause random server reboots. Depending on what type of hardware you are running, it should be possible for your hosting to do a chassis swap on the server if the issue continues.

answered Jun 20, 2011 at 18:34

server_knuckle_dragger

172 bronze badges

-1 for not reading the question
– Michael Lowman
Commented Jun 20, 2011 at 18:39
An extensive hardware test has been performed by the hosting company.
– Martin Schlagnitweit
Commented Jun 21, 2011 at 5:42

Add a comment |

Stack Exchange Network

How to find out, why a server hangs, but is still reachable with ping

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
debian
server-crashes
.

Hot Network Questions

How to find out, why a server hangs, but is still reachable with ping

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxdebianserver-crashes.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
debian
server-crashes
.