We run a simple deployment script remotely using a command like ssh deployer@10.170.4.11 sudo /root/run-chef-client.sh. It started to hang today because sshd waited forever on the 10.170.4.11 even after sudo had finished already. We started sshd in debug mode and got two different kind of logs. The following is a normal log when the session does not hang:
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 23187
debug1: session_exit_message: session 0 channel 0 pid 23187
debug1: session_exit_message: release channel 0
Received disconnect from 10.170.4.6: 11: disconnected by user
And when it hangs we get the following:
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 24209
debug1: session_exit_message: session 0 channel 0 pid 24209
debug1: session_exit_message: release channel 0
Our understanding is that the server process waits for some communication from a client side and never gets it. It's hard to tell if it is a client side or a server side problem.
We tried to run sshd under strace but did not succeed because a SUID bit on sudo was ignored it this case. So, what else should we try to debug/prevent this situations?