How can I stop an inlined background process when my script stops?

Question

In a shell script, I do the following:

#!/bin/sh

while true; do ssh -o ExitOnForwardFailure=yes -L 8080:localhost:80 -N server; sleep 1; done &

... rest of the script, which uses the tunnel as made above ...

This guarantees that the tunnel is always kept open and thus is re-opened in cases when the connection gets lost. This tunnel is used in other parts of the main script, omitted here. Those parts can handle a nonfunctional tunnel, they simply retry later on.

When the main script dies, for example due to a SIGTERM or SIGINT, I want that while loop to stop as well. There is no need to keep that tunnel open after the main script dies.

What is the common approach to do this in shell scripting? Note that I want two things:

prevent re-execution of the ssh command
stop and disconnect the current ongoing ssh session as soon as possible

I am not sure how to do all this in shell scripting.

Note that I am currently working in plain sh, but I can move on to bash if needed.

Since there are separate processes involved here, you cannot use a variable. But, you can use temporary file(s), like *lock file*s used by many applications. — FedKad, Jul 12 '19 at 09:42
I can see a shared lock file working for preventing to loop again, but how do I stop the ongoing ssh session? — Pritzl, Jul 12 '19 at 09:59
Create a temporary file before the `while` loop and test its existence in the `while` loop (instead of `true`). Catch the *termination* signals in your script, and when your script terminates, delete the temporary file and `kill` the running `ssh` process. — FedKad, Jul 12 '19 at 10:04
How can you kill the `ssh` process from within the main process? — Pritzl, Jul 12 '19 at 10:07

score 3 · Accepted Answer · edited Jun 12 '20 at 13:48

3

A somewhat general approach.

while true; do foo; sleep 1; done &
# the rest of the script here
kill -- -"$$"

The trick is the script runs child processes (here foo among others) with Process Group ID (PGID) equal to the PID of the shell. This propagates to grandchildren and so on. The shell itself is in this process group as well. There are exceptions (jobs in interactive shells, timeout) so this is not as general as you may want, still with foo being ssh or similar simple command in a non-interactive script the approach should work.

kill with a negative argument sends signals to the entire process group.

One caveat though: a possible race condition. In general foo may get killed before the subshell receives and handles the signal. If the delay is long enough (for whatever reason), a new foo may be spawned (especially if without sleep 1) after kill does its job. Consider this improvement:

while true; do foo; sleep 1; done &
subpid=$!
# the rest of the script here
kill "$subpid"
wait "$subpid" 2>/dev/null
# at this moment we're certain the subshell is no more, new foo will not be spawned
trap '' TERM
# foo will maintain the old PGID, so…
kill -- -"$$" 2>/dev/null

The trap is here only to make the main shell exit gracefully without printing Terminated to the console.

Not a general approach for any background process, yet usually a useful method for ssh in similar scenario.

Use autossh. From its manual:

autossh is a program to start a copy of ssh and monitor it, restarting it as necessary should it die or stop passing traffic.

[…]

autossh tries to distinguish the manner of death of the ssh process it is monitoring and act appropriately. The rules are:

If the ssh process exited normally (for example, someone typed exit in an interactive session), autossh exits rather than restarting;

If autossh itself receives a SIGTERM, SIGINT, or a SIGKILL signal, it assumes that it was deliberately signalled, and exits after killing the child ssh process;

[…]

[…]

If the child ssh process dies for any other reason, autossh will attempt to start a new one.

Therefore:

autossh … &
apid=$!
# the rest of the script here
kill "$apid"

Note you won't be notified if the tunnel cannot be established in the first place. Since this is a possible flaw in your original approach as well, I'm not addressing this problem here.

edited Jun 12 '20 at 13:48

Community

1

answered Jul 12 '19 at 09:58

Kamil Maciorowski

69,815
22
136
202

Thank you! I can see this working for `ssh` specifically, but if possible, I would like to understand how I could do this in general. That is, when there is no `autoxyz` which can help me. That said, I'm glad you brought `autossh` up, as I did not know that! – Pritzl Jul 12 '19 at 10:02
Regarding your last remark, I added `-o ExitOnForwardFailure=yes` to combat this. Should this be enough? – Pritzl Jul 12 '19 at 10:03
@Pritzl It depends. If the tunnel cannot be established, `autossh` will try again and again. But you may prefer to abort the entire script if the tunnel cannot be established in the first try. In such case more "tricky" approach may be desired. The general question is: how does your "rest of the script" handle temporary (or not) failure of the tunnel? – Kamil Maciorowski Jul 12 '19 at 10:08
Right, I get you. The rest of the script "hopes for" a working tunnel, but if the tunnel is broken for some reason, they `sleep` and retry later. So the script can handle tunnel failures. – Pritzl Jul 12 '19 at 10:11
@Pritzl OK I understand. `-o ExitOnForwardFailure=yes` seems reasonable. Please see the "Continued failures" section of the `autossh` manual. Your original approach restarts after 1 second and you may want to tune the behavior of `autossh` (e.g. with `AUTOSSH_POLL`). – Kamil Maciorowski Jul 12 '19 at 10:19
Thank you. I can definitely use this approach for `ssh` related executions. In my question, `ssh` could have been anything, so I'm also wondering about the generic case. – Pritzl Jul 12 '19 at 10:26
@Pritzl The answer now contains a general approach. – Kamil Maciorowski Jul 12 '19 at 18:11
Thank you, VERY interesting. I will try it out and let you know. Appreciated. – Pritzl Jul 12 '19 at 18:53
It took me some time to figure it out but I think I got it. That race condition was interesting, thank you for including that. My final thought is: assume there are multiple such `while true; do XYZ; done &` loops. Do I "really" need to fetch their PID (via `$!`) separately and do `kill "$Px"; wait "$Px"`, for every background process? Is there a more generic approach to just `kill; wait` all created background processes without having to remember their PID? (We of course end with a single `kill -- -"$$"` to make sure that sneaky processes are still killed, as you explained.) – Pritzl Jul 12 '19 at 20:10
@Pritzl `pkill -P $$` comes to mind. This should kill direct children of the current shell. – Kamil Maciorowski Jul 12 '19 at 20:49

How can I stop an inlined background process when my script stops?

1 Answers1

Linked