The PC-to-server figures look fine, so it seems that the connection from the SRW2024P switch to the server is ok, and the PC is auto-negotiating correctly when connected to the same switch.
If it’s a duplex mismatch I’d therefore expect the V4K to be the one running at half duplex, since it’s the one getting poor transmit figures. Running ethtool on the V4K+ should answer this one.
Just did that. It now runs Linux vero4k 3.14.29-139-osmc #1 SMP Tue Feb 19 04:09:47 UTC 2019 aarch64 GNU/Linux.
vero4k+ runs full duplex:
root@vero4k:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: ug
Wake-on: d
Current message level: 0x0000003d (61)
drv link timer ifdown ifup
Link detected: yes
The MTU on the NFS server is 9000:
box ~ # ip link show dev eth0
2: eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 9000 qdisc htb state UP mode DEFAULT group default qlen 1000
link/ether 74:d4:35:e7:ac:e6 brd ff:ff:ff:ff:ff:ff
The MTU on vero4k+ is 1500:
root@vero4k:~# ip link show dev eth0
2: eth0: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether c4:4e:ac:29:33:a4 brd ff:ff:ff:ff:ff:ff
hm… I am not sure what this will show. If there is no traffic at all, I wouldn’t expect the Ethernet driver to crash. How would you like me to test it?
The default is 1MB.
root@vero4k:~# cat /etc/fstab
# rootfs is not mounted in fstab as we do it via initramfs. Uncomment for remount (slower boot)
#/dev/vero-nand/root / ext4 defaults,noatime 0 0
#10.11.12.1:/data /data nfs defaults,auto,rsize=1048576,wsize=1048576,noatime,nodiratime,intr,cto,tcp,vers=3 0 0
10.11.12.1:/data /data nfs noauto,x-systemd.automount,noatime,nodiratime,vers=3 0 0
# mount | grep /data
systemd-1 on /data type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
10.11.12.1:/data on /data type nfs (rw,noatime,nodiratime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.11.12.1,mountvers=3,mountport=49800,mountproto=udp,local_lock=none,addr=10.11.12.1)
I installed the development updates. Let’s see if it crashes again…
You could try Kodi based NFS access for a while. This will use libnfs.
We cannot do a frame size above 3052 (IIRC, from top of my head).
1500 on Vero will be fine.
If you keep getting issues, I can send a debug kernel. I only recently found a cause of eth0 dying due to low traffic. We only had one user affected but he was running a DHCPless and Avahiless environment, and the low RX packet count caused us to reset the PHY as we thought we were not getting acks (tcp oriented patch series for sure…).
Personally, I think we’ll just end up finding a very strange bug exposed by your network configuration. I’d prefer a hardware fault though – the solution is easier
I asked you if there was anything out of the ordinary about the network. Surely running 9K jumbo frames across the network qualifies as being “out of the ordinary”.
AFAICT, the only Pi that supports 9K jumbo frames is the 3B+.
I would have thought that the next step has to be removing the jumbo frames and running the server with an MTU of 1500. Then (a) re-run the iperf3 figures from the V4K+ and (b) see if the network panics still occur.
It is funny that you can crash a vero4k+, by just plugging in a computer with 9k jumbo frames, even if there is no real connection between that computer and vero4k+…
AFAIK the generally accepted rule is that if you’re going to use jumbo frames, then every node in the network needs to support jumbo frames. Clearly, this wasn’t the case here.
I’m a bit rusty on this stuff but it’s unclear to me why you were still seeing those “oversized frame” messages with the NFS server’s MTU set to 1500. They are probably related to things such as ARP, broadcast and multicast traffic from other devices on the LAN, which were at the time still on a 9K MTU. That said, I thought that the switch/router should have dealt with any 9K frames – either rejecting or fragmenting them, as appropriate – before sending them to the Vero4K+.
To answer your specific point:
there is always “chatter” between devices on a LAN. At the time you saw those messages, those other devices were still using a 9k MTU. With every node on the network now using an MTU of 1500, I would expect such messages to disappear.
Well zeroconf and avahi only to name two protocols that constantly exchange packets between all devices. If you want to check it install tcpdump and you can see them