11

In the old days, the init daemon used to disable tasks that were respawning too fast.

I discovered to my peril that UPSTART just stops them. forever.

I have found the COUNT and INTERVAL controls for when to stop a job that is respawning fast, but I cannot find how to restart that job after a specified waiting time.

  • This is for a remote server that must connect via ssh in order for me to tunnel back into it. But it regularly has connection problems and the job that runs the ssh connection is dying! I foolishly expected upstart to keep the job running.

Is there a way, or do I need to do something like run a cron job to check if it is running and if not restart it?

Keith
  • 612
  • 2
  • 6
  • 9

1 Answers1

10

Upstart allows a post-stop sleep in order to delay the restarting of a job. From http://upstart.ubuntu.com/cookbook/#delay-respawn-of-a-job

post-stop exec sleep 5

Now when the job fails, it will wait 5 seconds before attempting to connect again. Combine that with setting the respawn to something like 10 times in 30 seconds and it should try to respawn forever (it would only try 7 times in 30 seconds, so it shouldn't get kicked out of respawn)

tgm4883
  • 7,882
  • 3
  • 30
  • 35
  • Yes, that is a good workaround I suppose. I wonder though, should upstart have the posssibility to specify a wait time before attempting to run the job again. I really don't like the idea of it just stopping the jobs dead. I much prefered init's way of disabling a job for five minutes. – Keith Jul 09 '11 at 11:58
  • I think the use case you are talking about is pretty low. If a job is failing there is a reason for it, so it should be fixed. Most people wouldn't want the job to continue to attempt to start and fail as this could clear out logs with valuable information or have other catastrophic outcomes. I think you are using upstart for something it wasn't designed for, so standard methods to restart it might be out of the question. That said, does the network connection on that machine attempt to reconnect when the connection dies? If so, you might be able to have your job start on networking start. – tgm4883 Jul 09 '11 at 17:59
  • 1
    I'm sure you are correct that this is far from a "most people" case. As I've been hacking on slackware and custom-built systems for the last 20 years, a lot of these concepts are foreign to me. Could you point me in the right direction to customise Ubuntu networking start/restart ? God this makes me feel old and out of touch. I could do it on slackware in 10 seconds. – Keith Jul 17 '11 at 14:30
  • Well the question remains exactly what happens when your system loses connectivity. If networking also stops (eg. 'service networking status' shows stopped), then you could simply have your upstart job start on and stop on networking. If it doesn't stop, we may have to look at other things. – tgm4883 Jul 19 '11 at 00:56
  • 1
    this is covered in the cookbook; they recommend using a post-stop stanza: http://upstart.ubuntu.com/cookbook/#id379 – jcomeau_ictx Nov 23 '13 at 02:45
  • 1
    Thanks @jcomeau_ictx, that's exactly it, I think the answer should be updated with the link to it http://upstart.ubuntu.com/cookbook/#delay-respawn-of-a-job. Also there's an accepted ticket and patch for Keith request https://bugs.launchpad.net/upstart/+bug/252997 with respawn wait time before retry. – Andre Miras Jan 15 '14 at 09:40
  • I've updated the answer. Just FYI, @AndreMiras and jcomeau_ictx you can edit the answer yourself and it the edit will be put in a queue for review. – tgm4883 Jan 16 '14 at 16:26
  • There's a more sophisticated version of this `post-stop` stanza over on [Server Fault](http://serverfault.com/a/569148/8437); it demonstrates exponential backoff and only sleeping on respawn. – Josh Kelley Sep 01 '15 at 20:22