0

I have one application which runs 24x7 in my system. But some how it was killed abruptly. I observe it 2 or 3 times from last 10 days.

Now I want to find from how much time my application is stopped. So I can notify it and able to find bug from application. And also it will help me to create cronjob.

ravibhuva9955
  • 237
  • 1
  • 10
  • Possible duplicate of *http://superuser.com/questions/345447/how-can-i-trigger-a-notification-when-a-job-process-ends* – MariusMatutiae May 21 '14 at 06:57
  • Does dmesg tell you anything? You should (ideally) write your application to log any unexpected exits. If you didnt write the application you can write a small app that constantly monitors the app in question and logs when it closed. You could even write the app to restart your application in question when it exits. Unfortunately a monitoring app can't tell you WHY your app exited. – Kinnectus May 21 '14 at 06:59
  • Nothing is displayed in dmesg. As well as my application is not stopping gracefully other wise it displays some log message when exiting gracefully. – ravibhuva9955 May 21 '14 at 07:02
  • Did you write the application? You need more 'try-catch' statements at the stages your app is most likely to exit - file access, network access, user interaction etc. – Kinnectus May 21 '14 at 07:43
  • Yes, I wrote this application. But it's to bursty code and also 7 to 8 other modules are included with this application. – ravibhuva9955 May 21 '14 at 08:53

1 Answers1

1

I'd recommend atop with it's service atopsar. It monitors start and stop time of processes, besides disk usage and (via an extra service) network activity.

atopsar monitors your processes on a regular interval (e.g. 5 minutes) and logs that to a file. You can open that file afterwards and step through the history, showing all process details values like CPU and memory usage. Maybe this will provide you hints why your program crashed.

Also make sure that your /etc/security/limits.conf is propperly configured so that you get a core dump. This gives you something to debug and a timestamp.

trapicki
  • 599
  • 5
  • 10