0

I have one application which runs 24x7 in my system. But some how it was killed abruptly. I observe it 2 or 3 times from last 10 days.

Now I want to find from how much time my application is stopped. So I can notify it and able to find bug from application. And also it will help me to create cronjob.

5
  • Possible duplicate of superuser.com/questions/345447/… Commented May 21, 2014 at 6:57
  • Does dmesg tell you anything? You should (ideally) write your application to log any unexpected exits. If you didnt write the application you can write a small app that constantly monitors the app in question and logs when it closed. You could even write the app to restart your application in question when it exits. Unfortunately a monitoring app can't tell you WHY your app exited.
    – Kinnectus
    Commented May 21, 2014 at 6:59
  • Nothing is displayed in dmesg. As well as my application is not stopping gracefully other wise it displays some log message when exiting gracefully. Commented May 21, 2014 at 7:02
  • Did you write the application? You need more 'try-catch' statements at the stages your app is most likely to exit - file access, network access, user interaction etc.
    – Kinnectus
    Commented May 21, 2014 at 7:43
  • Yes, I wrote this application. But it's to bursty code and also 7 to 8 other modules are included with this application. Commented May 21, 2014 at 8:53

1 Answer 1

1

I'd recommend atop with it's service atopsar. It monitors start and stop time of processes, besides disk usage and (via an extra service) network activity.

atopsar monitors your processes on a regular interval (e.g. 5 minutes) and logs that to a file. You can open that file afterwards and step through the history, showing all process details values like CPU and memory usage. Maybe this will provide you hints why your program crashed.

Also make sure that your /etc/security/limits.conf is propperly configured so that you get a core dump. This gives you something to debug and a timestamp.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .