1

I've been scratching my head on this one for months and have decided its time to ask for help.

I seem to be encountering notable packet loss on a wifi network. Its noticeable with any live video/audio being streamed to any devices on the network. Generally, the connection is fine and stable, but it seems there are blips when there is total packet loss on a given device for 5-10 seconds every minute or two (these are stationary devices in a relatively isolated environment - that is, not much activity around them or other known activity on the network).

I'm at a bit of a loss because MTR reports that devices on wifi have 0% loss to the router and the loss all seems to be upstream:

Host                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. local-router xxxxxxxxxxxxxxxxxxxxxx     0.0%   128    2.2   5.7   1.7 129.0  17.5
 2. ISP-router (off premises) xxxxxxxxx     0.8%   128    4.1   8.1   3.1 139.3  14.2
 3. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   128    4.6   8.7   3.7 105.8  15.5
 4. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    11.7%   128   16.2  19.3  12.9 155.8  16.0
 5. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.8%   128   15.6  20.2  12.7 134.1  15.9
 6. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.8%   128   13.8  18.4  13.2 140.1  18.5
 7. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     1.6%   128   15.6  19.3  14.2 154.0  17.1
 8. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     1.6%   128   13.6  16.8  12.9 235.7  20.1
 9. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     1.6%   127   14.1  19.4  13.1 211.1  24.8

I see latency go up and timeouts at the same time I see packets dropping:

64 bytes from xxx: icmp_seq=105 ttl=58 time=14.048 ms
64 bytes from xxx: icmp_seq=106 ttl=58 time=14.312 ms
64 bytes from xxx: icmp_seq=107 ttl=58 time=14.323 ms
64 bytes from xxx: icmp_seq=108 ttl=58 time=135.899 ms
64 bytes from xxx: icmp_seq=109 ttl=58 time=186.013 ms
Request timeout for icmp_seq 110
Request timeout for icmp_seq 111
64 bytes from xxx: icmp_seq=112 ttl=58 time=13.927 ms
64 bytes from xxx: icmp_seq=113 ttl=58 time=14.140 ms
64 bytes from xxx: icmp_seq=114 ttl=58 time=15.410 ms
64 bytes from xxx: icmp_seq=115 ttl=58 time=15.171 ms
64 bytes from xxx: icmp_seq=116 ttl=58 time=13.993 ms
64 bytes from xxx: icmp_seq=117 ttl=58 time=14.378 ms

Finally, when I run MTR from a wired device on the network, everything seems fine:

 Host                                       Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. local-router xxxxxxxxxxxxxxxxxxxxxx     0.0%   208   0.7   0.6   0.4   1.9    0.2
 2. ISP-router (off premises) xxxxxxxxx     0.0%   208   3.9   4.0   1.7   39.5   5.0
 3. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   208   3.1   3.7   1.8   39.3   4.2
 4. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   208   12.0  14.9  11.1  31.8   3.9
 5. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   208   11.8  14.3  10.9  27.2   3.4
 6. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   208   12.6  12.5  11.7  13.9   0.3
 7. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   208   13.7  13.6  12.0  18.4   0.7
 8. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   208   12.8  12.6  11.6  14.5   0.4
 9. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     0.0%   207   12.7  12.5  11.4  14.1   0.3

I've seen this behavior off and on for months. Rebooting the router and AP doesn't seem to help. As far as hardware, the network is a simple, ISP-provided, C3510XZ serving as the modem and router with a unifi u6-lr access point connected directly to it. The unifi software is reporting a 99% experience on all devices I'm testing with, with signal strengths ranging from -71 to -60 dBm (edit: I've also done testing right next to the AP with a -45 dBm signal, and noticed identical behavior).


Generally, I'm just thrown as to why I'd be seeing different upstream behavior based on the introduction of the access point. I realize that each packet MTR sends has a different TTL, but I would think I'd see some packet loss to the local router if the wifi were the primary culprit. Regardless, any insight/suggestions would be appreciated.

  • Is it always the same upstream device? MTR is just pinging sequentially, so if you get packet loss for a second while you're pinging hop #4, then it would still show up as loss to hop #4, even if the packets are actually lost from your local router. – Cpt.Whale Feb 15 '23 at 17:40
  • If it's repeatable, then you might want to check with your ISP. Maybe the MTU for your wireless clients is larger than the ethernet ones, and the upstream router chokes on it for some reason (uncommon, but possible). There's not much else that could be different between client connections on the ISP side though – Cpt.Whale Feb 15 '23 at 17:45
  • @Cpt.Whale - the loss does seem to be somewhat spread across all of the upstream hops. I'm going to add this to the post, but I have noted the same behavior, even when right next to the access point ( -45dbm signal strength). How would the isp know the difference on the MTU? – si1entstill Feb 15 '23 at 17:48
  • Have you tried another wifi channel or checked for interference? Channel 11 is said to be the strongest. – harrymc Feb 15 '23 at 17:51
  • If the ISP has to shrink the MTU upstream, then it might fragment weirdly and get dropped. It's just an example of what can be different on packets from wired/wireless connections. Not necessarily the cause at all, but something I've seen cause issues with stuff like vpn clients, where stuff on the ISP side is buggy but can be fixed on your end. – Cpt.Whale Feb 15 '23 at 17:57
  • @harrymc - I've got it set to channel 2 because I see much lower noise there when scanning the area with a wifi analyzer. – si1entstill Feb 15 '23 at 18:04
  • Since you mention there's a spread, it would be better to just sit and ping each hop at the same time, and see if the packet loss increases dramatically for any hop in particular (as compared to pinging one at a time via MTR). It's always more likely that your wireless network has some trouble, so an ISP can be difficult to convince to investigate – Cpt.Whale Feb 15 '23 at 18:06
  • Are you using 2.4 GHz or 5 GHz? See [this answer](https://superuser.com/a/1206404/8672) for best channels on 2.4 GHz. – harrymc Feb 15 '23 at 18:08
  • @harrymc - 2.4 GHz for these test, but I see the same behavior on both. – si1entstill Feb 15 '23 at 18:23
  • Check if there is some electrical device between the router and the computer that can cause a strong electrical field. The final advice is to get a stronger router. – harrymc Feb 15 '23 at 18:26
  • @Cpt.Whale - is there something I could use to test this theory? I don't see a way to configure that on the router, and it seems that general advice is not to set it on the AP. – si1entstill Feb 15 '23 at 18:27
  • If you do a packet capture on the router, you could check whether wired/wireless MTU is different. On windows, you can do `netsh interface ipv4 show subinterfaces` to check the current MTU for each interface. You can do `ping google.com -f -l 1500` to see if that MTU (e.g. 1500) is allowed all the way through to the internet or gets fragged. Again, MTU is not often the cause for packet loss, just an example – Cpt.Whale Feb 15 '23 at 18:40

1 Answers1

0

a) there is nothing in Ethernet/IP dataframes that tells them to be sourced in a WIFI -based net.
b) There is likely to be at least one NAT-translation (here: packet identity change) among the hops as you progress out on internet.
c) I'd wager the problems are random to some degree. Maybe caused by neighboring WIFI-nets coming with activity in same or nearby channel(s).

The mtr runs you have done looks like the result of runs on an "unmloaded" network; if that is so, I suggest you do the same runs while you have those video streams active.
The result from such runs might be more indicative of where the problem is.

Hannu
  • 8,740
  • 3
  • 21
  • 39
  • I tried another 5 minute test while generating significant network activity and saw absolutely no loss (I had been seeing the same loss pattern for the prior 2 hours). With other network activity, the average pings were _slightly_ higher, and there were some outlier packets with _very_ high pings (1044 to the fourth hop and 1040 to the final). Is there any reason additional activity on the devices could cause lower loss but higher ping? – si1entstill Feb 15 '23 at 18:22
  • The software that forwards packets adapts to the situation at hand, more activity might give more weight to your streaming (actually transmitting it) instead of dropping it as if it is a "stray" packet. In the same situation a ping might be dropped if leaving more room for more active content. Note also that many personal routers have a setting `[ ] respond to ping`, and I'd expect that to be a possible setting for more advanced gear too. – Hannu Feb 16 '23 at 18:10