Network suddenly slow on Vero 4K+

Hi all, this is driving me nuts. I’ve previously posted about having issues with network speed (especially with large files) and got that sorted - I accidentally had both WiFi + Ethernet enabled. I switched to NFS from SMB as part of that, and everything has generally been excellent until last night, when things randomly started stuttering. I was previously seeing near gigabit speed (~950 Mbit) via iperf on my Vero 4K+. Now I’m seeing:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  26.2 MBytes   219 Mbits/sec    8   25.5 KBytes       
[  5]   1.00-2.00   sec  52.6 MBytes   442 Mbits/sec   27   43.8 KBytes       
[  5]   2.00-3.00   sec  35.6 MBytes   299 Mbits/sec   30   19.8 KBytes       
[  5]   3.00-4.00   sec  24.2 MBytes   203 Mbits/sec   11   25.5 KBytes       
[  5]   4.00-5.00   sec  41.5 MBytes   349 Mbits/sec   16   28.3 KBytes       
[  5]   5.00-6.00   sec  78.3 MBytes   657 Mbits/sec   32   18.4 KBytes       
[  5]   6.00-7.00   sec  54.1 MBytes   454 Mbits/sec   36   22.6 KBytes       
[  5]   7.00-8.00   sec  69.2 MBytes   581 Mbits/sec   29   45.2 KBytes       
[  5]   8.00-9.00   sec  41.6 MBytes   349 Mbits/sec   10   22.6 KBytes       
[  5]   9.00-10.00  sec  41.8 MBytes   351 Mbits/sec    7   35.4 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   465 MBytes   390 Mbits/sec  206             sender
[  5]   0.00-10.00  sec   465 MBytes   390 Mbits/sec                  receiver

Last night it was even slower, dipping down below 100 Mbit.

Things I’ve checked

1 - Used my laptop to test iperf to the NAS. I tried from a random ethernet point in my house, as well as off the switch my Vero is connected to, and finally the ethernet cable going to the Vero itself. It’s all great, I get around 930 Mbit/s on all of them.

2 - ifconfig settings to make sure wlan isn’t showing up and everything looks fine. Looks good to me:

eth0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
    inet <IP>  netmask <Netmask>  broadcast <Broadcast>
    ether <MAC>  txqueuelen 1000  (Ethernet)
    RX packets 106642  bytes 8365037 (7.9 MiB)
    RX errors 0  dropped 8  overruns 0  frame 0
    TX packets 338940  bytes 510302357 (486.6 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    device interrupt 40  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 4096
    inet 127.0.0.1  netmask 255.0.0.0
    inet6 ::1  prefixlen 128  scopeid 0x10<host>
    loop  txqueuelen 0  (Local Loopback)
    RX packets 19  bytes 6897 (6.7 KiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 19  bytes 6897 (6.7 KiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

3 - Looked at ethtool to make sure everything looks as it should be. Yup:

Settings for eth0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Full 
	Link partner advertised pause frame use: Symmetric
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: external
	Auto-negotiation: on
Cannot get wake-on-lan settings: Operation not permitted
	Current message level: 0x0000003d (61)
			       drv link timer ifdown ifup
	Link detected: yes

…and, obviously, tried “turning it off and on again” a few times. I’m struggling with what it could be given everything points to the physical network being fine, which means something on the Vero is borked (and it was working great until last night). I’m wondering whether it’s something physical, eg the card dying.

Any thoughts?

Also updated to the latest and greatest everything on the Vero just to rule out anything and ran iperf again. You can see it hitting close to, but still not quite, the speeds it should be hitting on the first run, but it’s all over the place and varies highly every time I run it (I’m seeing swings of around +/-200 Mbit in a single set of tests). The congestion window and retries also look really bad:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.01   sec  97.0 MBytes   810 Mbits/sec   30   53.7 KBytes       
[  5]   1.01-2.00   sec  75.0 MBytes   632 Mbits/sec   37   24.0 KBytes       
[  5]   2.00-3.00   sec  76.2 MBytes   640 Mbits/sec   29   36.8 KBytes       
[  5]   3.00-4.00   sec  75.0 MBytes   629 Mbits/sec   33   49.5 KBytes       
[  5]   4.00-5.00   sec  70.0 MBytes   587 Mbits/sec   36   18.4 KBytes       
[  5]   5.00-6.00   sec  73.8 MBytes   619 Mbits/sec   39   24.0 KBytes       
[  5]   6.00-7.00   sec  68.8 MBytes   577 Mbits/sec   35   25.5 KBytes       
[  5]   7.00-8.00   sec  61.2 MBytes   514 Mbits/sec   36   36.8 KBytes       
[  5]   8.00-9.00   sec  70.0 MBytes   587 Mbits/sec   35   21.2 KBytes       
[  5]   9.00-10.00  sec  62.5 MBytes   524 Mbits/sec   37   38.2 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   730 MBytes   612 Mbits/sec  347             sender
[  5]   0.00-10.02  sec   727 MBytes   609 Mbits/sec                  receiver

One other data point - I ran iperf the reverse way (OSMC as server, laptop and NAS as client), and it looks fine:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   939 Mbits/sec    0    495 KBytes       
[  5]   1.00-2.00   sec   112 MBytes   943 Mbits/sec    0    495 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec    0    495 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   938 Mbits/sec    0    495 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   938 Mbits/sec    0    495 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec    0    495 KBytes       
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0    495 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   938 Mbits/sec    0    518 KBytes       
[  5]   8.00-9.00   sec   112 MBytes   940 Mbits/sec    0    591 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec    0    591 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec                  receiver

Given all of that, I’m guessing it’s some kind of issue on the Vero where something is overwhelmed when the network is saturated and packets are dropping. MTU on eth0 is set to 1500, which should be fine. I couldnt view ring buffer settings on ethtool (guessing this is a limitation of the card), but at this point I’m throwing up my hands.

First, please don’t remove private (LAN) IP addresses. They don’t reveal any personal information, but can help to clarify what we’re seeing. Similarly, please include the original command with any output it produces.

You haven’t described your network layout in any great detail. Are you using just one router, or is it more complex?

A useful iperf3 test would have been between the laptop and Vero4K – with and without the -R flag.

In a situation like this, try to replace the cables, or at least pull them out and re-seat them. If you can, try to clean the contacts.

You haven’t described your network layout in any great detail. Are you using just one router, or is it more complex?

It’s somewhat complex. In the garage I have gigabit fiber → router → 4 port unmanaged switch which then goes to several cable drops around the apartment. The drop at the Vero is an 8 port unmanaged switch, and in my office where my NAS is it’s a 4 port unmanaged switch. So, end to end, it would be something like:

Vero 4K+ → Switch → Switch → Router → Switch → NAS (all switches are gigabit, all line is cat 6)

Note that I’ve ruled out the physical stuff - I’ve tried multiple cables (same results) and cleaned the contact points on the Vero and cables with alcohol (although they looked squeaky clean). I swapped my laptop in for the Vero 4K+ on the exact same cable, I tried the laptop at multiple points in that network, and everything is fine - always (basically) gigabit speeds.

A useful iperf3 test would have been between the laptop and Vero4K – with and without the -R flag.

Sure, that’s kind of what I was trying to do in my second post by running the server on the Vero and hitting it from the NAS (in which case everything is OK). I just did this - server on Vero, client on laptop, both normal and reverse

Vero ← Laptop

iperf3 -c 192.168.86.75
Connecting to host 192.168.86.75, port 5201
[  5] local 192.168.86.21 port 50601 connected to 192.168.86.75 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  53.2 MBytes   446 Mbits/sec                  
[  5]   1.00-2.00   sec   113 MBytes   947 Mbits/sec                  
[  5]   2.00-3.00   sec   112 MBytes   943 Mbits/sec                  
[  5]   3.00-4.00   sec   113 MBytes   944 Mbits/sec                  
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec                  
[  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   9.00-10.00  sec   112 MBytes   942 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.04 GBytes   893 Mbits/sec                  sender
[  5]   0.00-10.01  sec  1.04 GBytes   891 Mbits/sec                  receiver

Reverse

iperf3 -R -c 192.168.86.75
Connecting to host 192.168.86.75, port 5201
Reverse mode, remote host 192.168.86.75 is sending
[  5] local 192.168.86.21 port 50605 connected to 192.168.86.75 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  46.7 MBytes   392 Mbits/sec                  
[  5]   1.00-2.00   sec  32.8 MBytes   275 Mbits/sec                  
[  5]   2.00-3.00   sec  28.9 MBytes   242 Mbits/sec                  
[  5]   3.00-4.00   sec  28.4 MBytes   238 Mbits/sec                  
[  5]   4.00-5.00   sec  28.9 MBytes   242 Mbits/sec                  
[  5]   5.00-6.00   sec  24.4 MBytes   205 Mbits/sec                  
[  5]   6.00-7.00   sec  32.7 MBytes   275 Mbits/sec                  
[  5]   7.00-8.00   sec  31.4 MBytes   263 Mbits/sec                  
[  5]   8.00-9.00   sec  31.2 MBytes   262 Mbits/sec                  
[  5]   9.00-10.00  sec  25.1 MBytes   211 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   312 MBytes   261 Mbits/sec   82             sender
[  5]   0.00-10.00  sec   311 MBytes   260 Mbits/sec                  receiver

It’s also worth noting again that this just started happening suddenly without changing anything (even updating software). Up till last night, I could stream big files no problem, iperf showed great results on the vero (930-950 Mbit), and everything worked great.

EDIT: Oh geez, looks like I’m a dummy - judging from the terminal output iperf3 sends data from client->server, doesn’t it? In that case, Vero’s download looks fine, but upload is borked, although this doesn’t seem consistent with the behavior I saw (stuttering/buffering) or the bandwidth tests I’ve run (other devices in my network get around 800 Mbit down/up to the internet, the vero is doing like 350 Mbit down, 500 up).

iperf3 -c xx.xx.xx.xx sends data to the iperf3 server.
iperf3 -R -c xx.xx.xx.xx receives data from the iperf3 server.

Generally, it’s the receive speed that is of most concern.

Another test would be to place the V4K on a different part of the network. Test against two different iperf3 servers.

Huh, I just assumed the server would send the data, not the other way around, and yup, I understand the RX speed would be the concern, so it’s odd that the results are coming back that way given with the degraded video performance I was seeing on the Vero. To be honest, I’ve already tried a lot of what you’re suggesting (testing at different points in the network, from different servers) to the same results. For example, here’s the Vero <-> NAS, running the commands on my NAS and server on the Vero, like the run with my laptop:

iperf3 -c 192.168.86.75
Connecting to host 192.168.86.75, port 5201
[  5] local 192.168.86.35 port 46778 connected to 192.168.86.75 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   113 MBytes   944 Mbits/sec    0    441 KBytes       
[  5]   1.00-2.00   sec   113 MBytes   945 Mbits/sec    0    441 KBytes       
[  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec    0    441 KBytes       
[  5]   3.00-4.00   sec   112 MBytes   942 Mbits/sec    0    441 KBytes       
[  5]   4.00-5.00   sec   112 MBytes   937 Mbits/sec    0    557 KBytes       
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec    0    557 KBytes       
[  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec    0    557 KBytes       
[  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec    0    557 KBytes       
[  5]   8.00-9.00   sec   113 MBytes   944 Mbits/sec    0    557 KBytes       
[  5]   9.00-10.00  sec   112 MBytes   939 Mbits/sec    0    557 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

iperf3 -R -c 192.168.86.75
Connecting to host 192.168.86.75, port 5201
Reverse mode, remote host 192.168.86.75 is sending
[  5] local 192.168.86.35 port 46782 connected to 192.168.86.75 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  72.6 MBytes   609 Mbits/sec                  
[  5]   1.00-2.00   sec  72.8 MBytes   611 Mbits/sec                  
[  5]   2.00-3.00   sec  69.4 MBytes   582 Mbits/sec                  
[  5]   3.00-4.00   sec  76.4 MBytes   641 Mbits/sec                  
[  5]   4.00-5.00   sec  72.8 MBytes   611 Mbits/sec                  
[  5]   5.00-6.00   sec  69.5 MBytes   583 Mbits/sec                  
[  5]   6.00-7.00   sec  80.2 MBytes   673 Mbits/sec                  
[  5]   7.00-8.00   sec  79.2 MBytes   664 Mbits/sec                  
[  5]   8.00-9.00   sec  83.7 MBytes   703 Mbits/sec                  
[  5]   9.00-10.00  sec  83.8 MBytes   703 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   761 MBytes   638 Mbits/sec  355             sender
[  5]   0.00-10.00  sec   760 MBytes   638 Mbits/sec                  receiver

Same results - inconsistent from OSMC → device with lots of retries and a crappy congestion window if you look at the server-side logs.

I’m getting a confused message here. You previously wrote:

So the Vero is the client and the laptop address is 192.168.86.75.

Now you say:

So in this example the server has an IP address of 192.168.86.75 – but that’s the laptop address in the previous example.

Ah, sorry, that was written before I realized that iperf3 servers actually received from the client, not the other way around. 192.168.86.75 is the Vero IP - I edited for clarity. It’s the same results in both cases.

So, you’re now saying that your Vero4K+ is receiving data at the full ~940 Mbps but that it is sending at a lower speed. Is that correct?

That is indeed what I’m seeing. I did see degraded performance in both directions the night I ran into those stuttering issues, but I’m only able to consistently repro bad TX, not RX performance.

It’s not clear why a slower TX side would cause such a performance hit, since RX has by far the greater amount of traffic.

My best suggestion is the try to see at which point along the data path these problems are occurring. Since you’re using unmanaged switches, swap ports, reseat cables and run iperf3 with your Vero4K+ connected at various points along the network.