6

I am playing around with Ubuntu 18.04 and I'm noticing TCP performance regression that prevents me from upgrading my current running servers since they are very latency-sensitive.

For my particular use-case, I implemented a simple TCP test program that I'm running as a server, waiting for a certain number of clients to be connected before sending a batch of messages to all clients. I then measure the time it takes to sendmsg() a buffer of fixed-size to N clients.

Code sample

io_uring.c client.c

I'm running the server and the clients on two distinct machines, located in the same datacenter.

Results

Ubuntu 16.04

Version

srv01:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

srv01:~$ uname -ra
Linux srv01 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

sysctl

net.core.netdev_max_backlog = 3000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_moderate_rcvbuf = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_rmem = 131072 1048576 16777216
net.ipv4.tcp_wmem = 131072 1048576 16777216

Run

srv01:~$ taskset -c 15,17,19 ./iouring --port 4040 --clients-count 3 --buffer-size 128 --batch-size 100 --sleep-ms 1 --total-messages 500000 --sockopt n
Send latency report
Min: 679ns
Mean: 1048ns
Max: 13949ns
p(0.100000) = 726ns
p(0.200000) = 734ns
p(0.500000) = 750ns
p(0.800000) = 859ns
p(0.900000) = 1338ns
p(0.990000) = 5024ns

Ubuntu 18.04

Version

srv02:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:        18.04
Codename:       bionic

srv02:~$ uname -ra
Linux srv02 4.20.17-042017-generic #201903190933 SMP Tue Mar 19 13:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

sysctl

net.core.netdev_max_backlog = 3000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_moderate_rcvbuf = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_rmem = 131072 1048576 16777216
net.ipv4.tcp_wmem = 131072 1048576 16777216

Run

srv02:~$ taskset -c 15,17,19 ./iouring --port 4040 --clients-count 3 --buffer-size 128 --batch-size 100 --sleep-ms 1 --total-messages 500000 --sockopt n
Send latency report
Min: 819ns
Mean: 4660ns
Max: 25061ns
p(0.100000) = 1379ns
p(0.200000) = 2660ns
p(0.500000) = 4871ns
p(0.800000) = 6559ns
p(0.900000) = 7416ns
p(0.990000) = 9444ns

As you can see, there is a significant performance drop on Ubuntu 18.04 with kernel 4.20 (I'm observing the exact same on kernel 4.15 and 4.18). I'm using the exact same machines (same hardware).

Somehow, it looks like it is related to TCP_NODELAY being enabled or not. When disabling TCP_NODELAY (no --sockopt), I'm getting a 50th time of 1005ns (Ubuntu 18.04).

srv02:~$ taskset -c 15,17,19 ./iouring --port 4040 --clients-count 3 --buffer-size 128 --batch-size 100 --sleep-ms 1 --total-messages 500000
Send latency report
Min: 817ns
Mean: 1790ns
Max: 15374ns
p(0.100000) = 955ns
p(0.200000) = 964ns
p(0.500000) = 1006ns
p(0.800000) = 1601ns
p(0.900000) = 4475ns
p(0.990000) = 9516ns

The same test on RHEL8, kernel 4.18 does not show any performance difference with or without TCP_NODELAY (950ns 50th).

Any idea what could cause this ? I can provide more details if needed.

Thanks

octal
  • 111
  • 4
  • I would recommend trying to reduce it and simplify it as much as possible. For example, can you see the difference even using just a single client? What about running server and client(s) on the same machine, does the difference disappear then? – Elias Oct 29 '19 at 13:20
  • Have you tried the HWE kernel ? Should be 5.0.0.32.89 at this time. See https://wiki.ubuntu.com/Kernel/LTSEnablementStack for installing. – pim Oct 30 '19 at 07:32
  • I am having troubles getting your programs to work on my systems, and get things like `Min: -13858337ns Mean: 36896353430407ns Max: -4146775ns` and `resource temporarily unavailable` and `floating point exception`... I only have 8 CPUs and adjusted the taskset command accordingly. – Doug Smythies Nov 01 '19 at 19:00
  • @pim I did not know about the HWE kernel. I'll look more deeply into it – octal Nov 04 '19 at 12:55
  • @DougSmythies Are you getting this result on the client side or server side ? Getting that in the client side might be normal if your clock is not correctly synchronized. You might want to increase your recv and send buffer size also – octal Nov 04 '19 at 12:56
  • My clocks seem to be synchronized. Yes, I probably need to increase the buffer sizes, but am not going to. In addition to the HWE kernel, and just for a test, you might want to try the most recent [mainline](https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.4-rc6/) kernel. It will not install on 16.04 (for stupid dependency reason, that actually it doesn't depend on at all), but it will on 18.04. It will install fine on 16.04 if you compile it yourself. I have also noticed a recent drop in the performance of some other high speed stuff, but all with more recent kernels than so far referenced. – Doug Smythies Nov 05 '19 at 00:03

1 Answers1

1

Seems that your 16.04 kernels are not patched for spectre/meltdown.

In my understanding kernel 4.4.0-109 contains partial patch, you are running 4.4.0-97 that have not mitigated any actions towards addressing side-channel attacks, thus delivering better performance.

Is it wise to run those on SMP production system - completely different question...

Pasi Suominen
  • 974
  • 7
  • 11
  • Good catch noticing the old old 4.4 kernel version. I did a cross core pipe-test with kernel 5.4-rc6 with and without RETPOLINE and got about 13% performance difference (4.3 uSec / loop vs. 3.8 uSec per loop). On one core, the difference was much less, ~4%. I up-voted your answer a few days ago. – Doug Smythies Nov 05 '19 at 08:08
  • I disabled every mitigation (pti, l1tf, spectre_v1, spectre_v2, meltdown). Do you think it still might have an impact on the kernel if compiled with a compiler with retpoline-enabled support ? – octal Nov 05 '19 at 09:04
  • I don't know. Perhaps edit your question adding exactly what you did, then I'll try it. I compiled the kernel (1000 Hz tick, I never use the 250 Hz tick versions, i.e. I use lowlatency, never generic) with the Ubuntu kernel configuration and then again with `CONFIG_RETPOLINE=y` changed to `# CONFIG_RETPOLINE is not set`. – Doug Smythies Nov 05 '19 at 14:50