0

We have 4 routers covering our house/outbuildings - one that actually routes and 3 wired to it and set as access point only.

My SSH PuTTY sessions are generally pretty rock-stable, but if I move from upstairs (RT-N66u) to downstairs (RT-AC68u) or vice versa it cuts out every single time.

The SSID have the same name. Both are 2.4GHz. Upstairs is N, downstairs is AC.

Running latest Windows 10 on my laptop, and PuTTY 0.73. In my ssh_config I have these lines:

ClientAliveInterval 5
ClientAliveCountMax 12

In PuTTY I have

Seconds between keepalives (0 to turn off)       25
Disable Nagle's algorithm (TCP_NODELAY option)   checked
Enable TCP keepalives (SO_KEEPALIVE option)      checked

Any other salient information that I'm missing?

Is this a fixable problem, or "just the way it is"?

Thank you.

Codemonkey
  • 441
  • 3
  • 10
  • 26
  • 2
    Are those 3 additional routers _really_ "access point only"? (Some models use this term differently...) Seamless roaming requires bridging, so verify that your computer receives the same IP address when manually connecting to any of them, that it receives the same "default gateway", and that `arp -a` shows the same MAC address of the gateway. And just in case, see if those APs have any firewall enabled, even if it shouldn't have any effect in bridge mode. – u1686_grawity Nov 18 '20 at 08:03
  • Default gateway is `192.168.1.1` regardless of which one I'm connected to. I have a static IP (should have said, sorry) for the laptop set to `192.168.1.41`. I've also disabled the "random hardware address" in Windows because I thought MY mac address changing might be breaking things. No firewall settings on the routers that I know of. I'll have to check `arp -a` in a bit. – Codemonkey Nov 18 '20 at 08:08
  • 1
    I'd try temporarily setting it to DHCP just to verify. And if you leave a `ping -t` running to the server, how much packet loss do you get during roaming? Does your computer wait until the Wi-Fi connection is unusable before it decides to roam? (Your "keepalive" options make the SSH connection much more sensitive to this than it would be otherwise.) – u1686_grawity Nov 18 '20 at 08:28
  • Can you tell me what settings (keepalive etc) I should change to increase my chances? Thank you! – Codemonkey Nov 18 '20 at 09:04
  • Just tried `ping -t`. No packet loss at all on this attempt, just one slow one (900ms). SSH session disconnected though ("Network error: software caused connection abort") – Codemonkey Nov 18 '20 at 09:08
  • `arp -a` output is identical on either router. – Codemonkey Nov 18 '20 at 09:09
  • I figured an interval (on sshd) of 5 seconds with a countmax of 12 would keep trying for 1 minute, and only disconnect if it received no response within that minute. Do I have this wrong? I'm not really clear on the importance (if any) in the PuTTY keepalive as well, nor what it should be set to. Please advise! – Codemonkey Nov 18 '20 at 09:29
  • 1
    My next guess is that either you have a third-party firewall that interprets roaming as disconnection and forgets all states, or... wouldn't expect Windows itself to do that, but who knows. – u1686_grawity Nov 18 '20 at 09:33
  • All I'm using is Windows Defender on the laptop and iptables with a very basic "allow ssh, http, https" configuration on the server. – Codemonkey Nov 18 '20 at 09:36
  • 2
    Perhaps more lenient settings on the server: `ClientAliveInterval 20` and `ClientAliveMaxCount 5`. (For answering add to your comment `@harrymc` for me to be notified.) – harrymc Nov 18 '20 at 09:51
  • (Regarding keepalives, in my experience, it's not that they occur too often by themselves, but rather that a single lost packet can sometimes cause the whole TCP connection to wedge for much longer than it should (until it reaches some sort of timeout and begins to retransmit), and if the client keeps trying to send during that time, that might hasten its demise... Though none of that applies if you get a "Connection aborted" error _immediately_ after roaming.) – u1686_grawity Nov 18 '20 at 11:51
  • It doesn't happen immediately @user1686. I've not timed it but I'd say it IS about a minute (i.e. my 5 x 12). The prompt becomes unresponsive immediately, of course. Changing my `ClientAliveInterval` and restarting the `sshd` service hasn't helped I'm afraid, @harrymc – Codemonkey Nov 18 '20 at 12:02
  • To clarify, do we believe that it definitely IS possible for me to roam around my house without my sessions disconnecting? I don't know what else to try, or if I should just give up! @user1686 – Codemonkey Nov 19 '20 at 13:39
  • 1
    OK, I'm pretty sure I've fixed it: https://superuser.com/a/548570/338607 – Codemonkey Nov 19 '20 at 14:03
  • It should definitely be possible in general, as Wi-Fi roaming is transparent at L2 and does not directly do anything to TCP connections. (Though it does cause a very brief moment of packet loss in both directions, which can indirectly cause TCP's ack/retransmit mechanisms to think there is a problem.) – u1686_grawity Nov 19 '20 at 14:06
  • See the comment I've just posted above yours @user1686 – Codemonkey Nov 19 '20 at 14:07
  • I've just tested it on a Linux laptop running two SSH connections -- one continuously receiving data from `mtr` and one completely idle with no keepalives of any kind. In most cases, roaming was swift (with the old AP still in range) and both connections just kept running. In one case, roaming took a few seconds and this caused the active connection to freeze up for 3-4 minutes before recovering (but it didn't die), whereas the idle connection was completely fine and responded immediately when I poked it. – u1686_grawity Nov 19 '20 at 14:07
  • Ah, so it _was_ a Windows-specific problem. You should post that as a full answer in this case. (It's not an exact duplicate since it occurs in a different situation...) – u1686_grawity Nov 19 '20 at 14:09

1 Answers1

0

The problem for me was down to Windows' Receive Window Auto-Tuning Level setting.

Here's MS's description of it:

https://docs.microsoft.com/en-gb/troubleshoot/windows-server/networking/receive-window-auto-tuning-for-http

Open a command prompt ("run as administrator") and issue this command:

netsh int tcp set global autotuninglevel=disabled

Then restart.

Certainly fixed it for me on my Windows 10 laptop.

Codemonkey
  • 441
  • 3
  • 10
  • 26