15

I recently spent the Easter bank holiday with my parents, who live in a very rural area in the UK. They have a (terrible) ADSL internet connection, which is run over several kilometres of dodgy copper and is periodically interrupted when nearby farmers reverse their tractors into the phone lines.

I noticed that their router was repeatedly dropping the pptp handshake, and renegotiating it, killing the connection effectively. This was frustrating. So, in an attempt to avoid going insane, I told it to double the minimum acceptable SNR margin and handshake for lower speeds:

$ telnet 192.168.1.1
Trying 192.168.1.1...
Connected to 192.168.1.1.
Escape character is '^]'.
U.S. Robotics Wireless MAXg ADSL Gateway
Login: ***********
Password: 
> sh


BusyBox v1.00 (2006.02.17-20:30+0000) Built-in shell (msh)
Enter 'help' for a list of built-in commands.

# adsl configure --snr 200; exit 
Connection closed by foreign host.

This improved matters, and the thing got a (somewhat) stable, if incredibly slow, pipe to the outside world:

$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
64 bytes from 8.8.8.8: icmp_seq=0 ttl=55 time=3236.679 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=55 time=3699.541 ms
...

At about this point, real life intervened and I then spent several hours playing with cats, looking at cat gifs on my phone, actually talking to my family, etc. I forgot that I'd left this ping process running, and came back about a day later to hit ctrl-c.

The summary statistics shown floored me:

--- 8.8.8.8 ping statistics ---
103074 packets transmitted, 100564 packets received, 2.4% packet loss
round-trip min/avg/max/stddev = 32.986/3034.479/3600577.732/87527.276 ms

As you can see, the maximum recorded response time for an ICMP packet making a short transatlantic hop to Google's DNS server is 3600577.732 ms. That's almost exactly an hour, and certainly far longer than ping's default timeout.

How on Earth can this be? Is it accurate? What router will happily hold onto a packet for sixty minutes before sending it on its way? Why wasn't this packet dropped? Is it the result of an overflow from the 8-bit packet counter combined with large latencies?

Finally, I would be interested to know if there is any code of conduct in the UK stating that consumer ADSL connections are expected to have less latency and better traffic management than that by RFC 1149 and RFC 2549 ;-).

Landak
  • 151
  • 7
  • 1
    a router that isn't overloaded with spam. – ratchet freak Mar 30 '16 at 10:14
  • A greedy router ;) Just gotta play this when you notice it's planning to wait an hour before forwarding the packet. https://www.youtube.com/watch?v=moSFlvxnbgk – Cestarian Apr 05 '16 at 03:13
  • 1
    Is it possible that, since you said that you left it overnight, your computer applied the +1 hour at the start of Daylight Saving Time and that messed with the calculation of a particular packet's RTT? – kenkh Apr 05 '16 at 13:10
  • @kenkh -- Nope; I'm definitely sure that the day the clocks changed wasn't that one. Besides, looking at the `ping` [source code](http://www.opensource.apple.com/source/network_cmds/network_cmds-307.0.1/ping.tproj/ping.c) (from ~l 761) I can see that timezones are ignored in the subsequent calculation (`gettimeofday(nv, NULL)` returns epoch microseconds). It really did take an hour! – Landak Apr 06 '16 at 14:43
  • if you have access to the system, could you run pathping on it? Would show roughly where the slowdown is – Journeyman Geek Apr 06 '16 at 16:24
  • @Landak like kenkh, I also think it is an artifact of the DST adjustment. Ignoring the timezone is a bad idea, if the clock time of the system is actually changed instead of just adjusting the timezone for output. Besides, the source code you linked is not for the `ping` you used. You were using a very old `busybox`, which has its own builtin for `ping`. As for it not being that day: I would suggest checking the system time on the Gateway; it might just be that it is off by a few hours or days. – Adaephon Apr 08 '16 at 12:31

2 Answers2

4

The ICMP packet and response for the ping are each 32 bytes long, so it seems for an hour-long ping that every byte was taking almost a minute to transmit.

This is only explainable by very generous error retry count (your doing?), coupled with very slow router and a painful wait or retries for each and every transmitted byte.

The Internet Protocol (IP) transmits data by datagrams and tries not to send partial ones. Once transmission is started, it will wait by default for 200 milliseconds for more bytes to be added to the datagram. Past that time, the software/firmware will send whatever it has as one datagram. In the case of the hour-long ping time, the packet payload may have been as small as one byte. As long as data was still arriving, the connection will not be terminated by the two participating sides.

What you can do :

  • If other phones, fax, or other devices are connected to the same phone line, check if they are protected by DSL filters. Be sure you don't put a filter on the line going to your DSL modem.
  • Try another router - once you've got yourself a bad device there's absolutely nothing you can do about it except throw it out and get something better.
  • If no other router is available, contact you ISP - they can run useful tests from their side.
  • If the ISP finds nothing, try to demand anyway a replacement router/modem.
  • If the same problem occurs with another router, get in touch with the phone company.

It can be quite complicated to locate a problematic switch, as it can be with the phone company, but the ISP may also have their own switches. Normally a problem with a switch is area-wide which helps locate the malfunctioning switch. But in a rural area where not too many subscribers are using that switch this might go undetected. If some neighbors are using the same ISP, try to find out what their connection is like.

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • I would turn 2 and 3 around. It's much easier to call the ISP first. They can do a test. If they see problems, they will send a new modem (not necessarily a router). If a new modem doesn't solve the problem, the ISP should contact the phone company. That's the way it works over here, but it may be different in the UK. – SPRBRN Apr 01 '16 at 19:15
  • I meant one should execute as many points above in parallel as possible. In my case, my ISP could decide to send over a technician, but if the problem is as trivial as unused DSL filters, it may decide to charge a fee. – harrymc Apr 01 '16 at 19:22
  • Ugh. The comments refer to numbers in a numbered list. Then, the poster of the answer edited the answer to remove the numbers, thereby making the comments seem out of place. Tsk tsk. – TOOGAM Apr 01 '16 at 21:13
  • @TOOGAM: The poster wanted to emphasis the relative unimportance of the order. – harrymc Apr 02 '16 at 06:38
  • Thank you for a helpful response -- they do already have DSL microfilters, and an unhelpful ISP who just says something like "Rural area, poor quality line, be grateful you get anything at all". The phone company basically shrug. It's a privately owned router, and I was thinking of replacing it with a "low noise" reputation one, such as the Billion range that can negotiate more pro-actively on noise margins. – Landak Apr 06 '16 at 14:45
  • @harrymc -- I'd be very interested to know how packet payload is negotiated -- namely an expanded form on the first part of your answer. _As far as I know_, I just dialled the gain up on the DSL modem, at the expense of its frequency response and hence bandwidth. Is it possible to find out easily what that datagram payload was? Where does the 200 ms default figure come from? – Landak Apr 06 '16 at 14:50
  • 1
    **200 ms** : This is built into the Windows/Linux software. I found it when analyzing why my company's product had a too-low TCP/IP throughput on small datagrams, then found the system calls that do "send immediately" on the socket. **Packet payload**: This is not negotiated, rather a too-large TCP/IP datagram will be cut into parts when it encounters a pathway where the [MTU](https://en.wikipedia.org/wiki/Maximum_transmission_unit) is too small. **DSL modem**: possible that changing parameters will somewhat compensate for bad phone-line/switch, but it should be fixed rather than compensated. – harrymc Apr 06 '16 at 15:11
  • 1
    Another tweak: disable ADSL2+ and stay with ADSL1 for stability. Whatever router you buy, assure you can tweak SNR margins properly (some info [here](http://www.karafilis.net/snr-margin-adsl/)). Billion routers are good, as is Netgear (I prefer the models which support DD-WRT with simple installation). This [article](http://blog.internode.on.net/2009/11/29/optimising-adsl2-service-performance/) can give you more ideas - and have a look at the distance/speed diagram. – harrymc Apr 06 '16 at 16:32
0

I see this case very much related to data congestion management although a very bad managed one for whatever the reason could be. To my understanding, there is packet buffer along the transmission system -badly managed, which causes this anomaly in the ICMP echo request/reply packets.

So the combination of having a bad congestion managment policy with having a ping session opened for hours can obviosuly result in such weird scenario.

More information about congestion management here.

Tamadite
  • 108
  • 1
  • 7