Vero4k+ random freezes with skb_panic()

Hi,

I have a vero4k+ that randomly freezes (during idle, but also during playback), with errors like this:

Mar  2 00:17:27 vero4k kernel: [ 3525.547042@0] skbuff: skb_over_panic: text:ffffffc001527288 len:2690 put:2690 head:ffffffc059dd5000 data:ffffffc059dd5042 tail:0xac4 end:0x680 dev:eth0
Mar  2 00:17:27 vero4k kernel: [ 3525.555113@0] BUG: failure at net/core/skbuff.c:100/skb_panic()!
Mar  2 00:17:27 vero4k kernel: [ 3525.555126@0] Kernel panic - not syncing: BUG!

and again an hour later:

Mar  2 01:17:27 vero4k kernel: [ 3568.295441@0] skbuff: skb_over_panic: text:ffffffc001527288 len:2690 put:2690 head:ffffffc00031e000 data:ffffffc00031e042 tail:0xac4 end:0x680 dev:eth0
Mar  2 01:17:27 vero4k kernel: [ 3568.303656@0] BUG: failure at net/core/skbuff.c:100/skb_panic()!
Mar  2 01:17:27 vero4k kernel: [ 3568.303697@0] Kernel panic - not syncing: BUG!

This happens quite frequently. A couple of times during playback of a movie. It also appears to be daily the case when it sits idle (I always find it frozen).

To find the error messages, today I installed rsyslog and enabled forwarding of all logs to a remote system. Before that I was not able to see what is wrong.

Any ideas how to troubleshoot this?

It is also interesting that the second time, it happened exactly one hour later (to the second)!

Complete logs would be useful.

I got this via grab-logs -A:

https://paste.osmc.tv/efuderiror

Debug enabled logs would be more helpful.

ok, here it is: http://paste.osmc.io/gurohosuco.xml

I anonymized my media library with this:

grab-logs -A -P | sed \
  -e "s| dir '.*' due | dir 'MEDIAHIDDEN' due |g" -e "s| dir '.*' as | dir 'MEDIAHIDDEN' as |g" \
  -e "s| directory '.*' does | directory 'MEDIAHIDDEN' does |g" \
  -e "s| was found in dir .*$| was found in dir MEDIAHIDDEN|g" \
  -e "s| for item '.*', it | for item 'MEDIAHIDDEN', it |g" \
  -e "s| title search for '.*'$| title search for 'MEDIAHIDDEN'|g" \
  -e "s| GetMovieId \(.*\), | GetMovieId \(MEDIAHIDDEN\), |g" \
  -e "s| Searching for '.*' using | Searching for 'MEDIAHIDDEN' using|g"

The error seems to be network-related, possibly something to do with your choice of NFS options - though that’s just a wild guess.

My suggestion would be to go for a simpler set of fstab options for /data, such as noauto,x-systemd.automount and see if the issue disappears.

ok, set it to:

10.11.12.1:/data /data    nfs   noauto,x-systemd.automount,noatime,nodiratime,vers=3 0 0

I need to specify vers=3 or it does not mount.
Then the noatime and nodiratime are just to avoid date updates on reads.

It did not change much though:

# mount | grep /data
systemd-1 on /data type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
10.11.12.1:/data on /data type nfs (rw,noatime,nodiratime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.11.12.1,mountvers=3,mountport=45489,mountproto=udp,local_lock=none,addr=10.11.12.1)

I think the main change is that it switched from tcp to udp.

Let’s see if it still freezes…

It froze again:

Mar  2 16:25:42 vero4k kernel: [ 7165.390208@0] skbuff: skb_over_panic: text:ffffffc001527288 len:2229 put:2229 head:ffffffc057ce47c0 data:ffffffc057ce4802 tail:0x8f7 end:0x680 dev:eth0
Mar  2 16:25:42 vero4k kernel: [ 7165.398142@0] BUG: failure at net/core/skbuff.c:100/skb_panic()!
Mar  2 16:25:42 vero4k kernel: [ 7165.398152@0] Kernel panic - not syncing: BUG!

Other ideas?
Is it possible that the hardware is faulty?

Does it happen on WiFi?
Do you have Flow Control enabled on your network?

What’s the NFS server running on? And is there anything out of the ordinary about the network?

The transport protocol is TCP; it’s just the initial mount that is using UDP.

@sam_nazarko I don’t use WiFi. This device plays 4k videos, which I guess would not be ideal for Wifi.

Regarding flow control:

# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:	on
RX:		off
TX:		off
RX negotiated:	on
TX negotiated:	on

So, it is off. This vero4k+ is connected to a gigabit ethernet port of a DIR860L router. I don’t see any options to enable flow control on this router.

@dillthedog the NFS server is a gentoo box. Nothing fancy about it. I already have 5 RPis running with NFS-root from the same NFS server. 2 of them are running OSMC.

This vero4k+ is not using NFS-root (it is plain vanilla, as it was shipped).

I always had random freezes on this vero4k+. Since the day I received it. But now they way too frequent. I cannot finish a movie without a freeze. When I received it, it was freezing once per week, so I thought this would be “normal” for a new device that is not probably that stable yet.

Nothing changed in the network since I received it (August 2018).

I’m surprised you waited so long to report the problem. Since it occurred from day 1, it could be a hardware issue.

Can you run an iperf3 test in both directions?

Well, I think it is common for OSMC devices to freeze from time to time. I guess it is about kodi…

My non-kodi RPIs never freeze. I have 3 PRIs 1B (yes, the oldest ones) playing 1080p movies (with omxplayer) around the clock for several years now (I run 3 private TV channels at my home). They never crashed. Not even once! But the OSMC ones (which are a lot more powerful: RPIs 3B+)… freezes do happen from time to time…

Here they are:

vero4k+ as client:

root@vero4k:~# iperf3 -c box       
Connecting to host box, port 5201
[  4] local 10.11.12.85 port 56755 connected to 10.11.12.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  56.3 MBytes   472 Mbits/sec    8   52.3 KBytes       
[  4]   1.00-2.00   sec  54.3 MBytes   455 Mbits/sec    6   48.1 KBytes       
[  4]   2.00-3.00   sec  47.9 MBytes   402 Mbits/sec    8   63.6 KBytes       
[  4]   3.00-4.00   sec  57.0 MBytes   478 Mbits/sec    6   63.6 KBytes       
[  4]   4.00-5.00   sec  57.1 MBytes   479 Mbits/sec    7   50.9 KBytes       
[  4]   5.00-6.00   sec  52.6 MBytes   441 Mbits/sec    9   32.5 KBytes       
[  4]   6.00-7.00   sec  62.0 MBytes   520 Mbits/sec    3   52.3 KBytes       
[  4]   7.00-8.00   sec  62.9 MBytes   528 Mbits/sec    5   73.5 KBytes       
[  4]   8.00-9.00   sec  59.9 MBytes   503 Mbits/sec   10   43.8 KBytes       
[  4]   9.00-10.00  sec  59.8 MBytes   502 Mbits/sec    4   49.5 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   570 MBytes   478 Mbits/sec   66             sender
[  4]   0.00-10.00  sec   568 MBytes   477 Mbits/sec                  receiver

iperf Done.

hm… it has a few retries…

vero4k+ as server:

root@vero4k:~# iperf3 -s    
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.11.12.1, port 39282
[  5] local 10.11.12.85 port 5201 connected to 10.11.12.1 port 39284
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  68.8 MBytes   577 Mbits/sec                  
[  5]   1.00-2.00   sec  73.7 MBytes   618 Mbits/sec                  
[  5]   2.00-3.00   sec  71.0 MBytes   596 Mbits/sec                  
[  5]   3.00-4.00   sec  73.2 MBytes   614 Mbits/sec                  
[  5]   4.00-5.00   sec  76.1 MBytes   638 Mbits/sec                  
[  5]   5.00-6.00   sec  73.9 MBytes   620 Mbits/sec                  
[  5]   6.00-7.00   sec  69.6 MBytes   583 Mbits/sec                  
[  5]   7.00-8.00   sec  77.3 MBytes   648 Mbits/sec                  
[  5]   8.00-9.00   sec  73.5 MBytes   616 Mbits/sec                  
[  5]   9.00-10.00  sec  72.1 MBytes   605 Mbits/sec                  
[  5]  10.00-10.04  sec  3.06 MBytes   660 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec   732 MBytes   612 Mbits/sec                  receiver

The client shows this:

# iperf3 -c vero4k
Connecting to host vero4k, port 5201
[  5] local 10.11.12.1 port 39284 connected to 10.11.12.85 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.01   sec  71.6 MBytes   596 Mbits/sec    0    113 KBytes       
[  5]   1.01-2.01   sec  74.4 MBytes   621 Mbits/sec    0    146 KBytes       
[  5]   2.01-3.01   sec  70.8 MBytes   596 Mbits/sec    0    156 KBytes       
[  5]   3.01-4.01   sec  73.5 MBytes   616 Mbits/sec    0    171 KBytes       
[  5]   4.01-5.01   sec  76.3 MBytes   638 Mbits/sec    0    194 KBytes       
[  5]   5.01-6.01   sec  73.2 MBytes   615 Mbits/sec    0    208 KBytes       
[  5]   6.01-7.01   sec  70.0 MBytes   587 Mbits/sec    0    281 KBytes       
[  5]   7.01-8.01   sec  76.9 MBytes   648 Mbits/sec    0    287 KBytes       
[  5]   8.01-9.00   sec  73.1 MBytes   617 Mbits/sec    0    307 KBytes       
[  5]   9.00-10.01  sec  72.5 MBytes   604 Mbits/sec    0    344 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   732 MBytes   614 Mbits/sec    0             sender
[  5]   0.00-10.01  sec   732 MBytes   614 Mbits/sec                  receiver

iperf Done.

No retries in this direction…

My devices do not freeze here.

The retry count looks suspicious there and you should be getting much higher speeds (in the order of 900Mbps). Is this a real Ethernet setup (i.e. no powerline adapters)?

Sam

This is interesting…
Do you use them daily? I think it is subject to the frequency of kodi use.

Yes, it is just a tiny CAT5E cable with length of 30cm to a DIR860L. Then this DIR860L is connected to a Linksys SRW2024P switch, on which the NFS server is connected.

I tried a PC which is connected directly to SRW2024P:

# iperf3 -c box
Connecting to host box, port 5201
[  5] local 10.11.12.226 port 58780 connected to 10.11.12.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   109 MBytes   917 Mbits/sec    1    734 KBytes       
[  5]   1.00-2.00   sec   105 MBytes   881 Mbits/sec    0    891 KBytes       
[  5]   2.00-3.00   sec   108 MBytes   902 Mbits/sec    1    690 KBytes       
[  5]   3.00-4.00   sec   108 MBytes   902 Mbits/sec    0    743 KBytes       
[  5]   4.00-5.00   sec   104 MBytes   870 Mbits/sec    8    393 KBytes       
[  5]   5.00-6.00   sec   105 MBytes   881 Mbits/sec    1    620 KBytes       
[  5]   6.00-7.00   sec   108 MBytes   902 Mbits/sec    0    699 KBytes       
[  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec    0    717 KBytes       
[  5]   8.00-9.00   sec   108 MBytes   902 Mbits/sec    0    751 KBytes       
[  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec    0    786 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.05 GBytes   898 Mbits/sec   11             sender
[  5]   0.00-10.04  sec  1.04 GBytes   892 Mbits/sec                  receiver

iperf Done.

The bandwidth got to 900Mbits with a few retries.

Let me bypass DIR860L and retry…

It got worst:

root@vero4k:~# iperf3 -c box
Connecting to host box, port 5201
[  4] local 10.11.12.85 port 58289 connected to 10.11.12.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  73.8 MBytes   618 Mbits/sec   31   55.1 KBytes       
[  4]   1.00-2.00   sec  49.7 MBytes   416 Mbits/sec    6   46.7 KBytes       
[  4]   2.00-3.00   sec  49.2 MBytes   414 Mbits/sec    9   58.0 KBytes       
[  4]   3.00-4.00   sec  53.1 MBytes   445 Mbits/sec    8   38.2 KBytes       
[  4]   4.00-5.00   sec  38.1 MBytes   319 Mbits/sec   11   49.5 KBytes       
[  4]   5.00-6.00   sec  50.9 MBytes   427 Mbits/sec   10   41.0 KBytes       
[  4]   6.00-7.00   sec  38.3 MBytes   321 Mbits/sec   11   28.3 KBytes       
[  4]   7.00-8.00   sec  38.6 MBytes   324 Mbits/sec    6   45.2 KBytes       
[  4]   8.00-9.00   sec  45.0 MBytes   378 Mbits/sec    8   38.2 KBytes       
[  4]   9.00-10.00  sec  47.9 MBytes   401 Mbits/sec    8   29.7 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   484 MBytes   406 Mbits/sec  108             sender
[  4]   0.00-10.00  sec   481 MBytes   403 Mbits/sec                  receiver

iperf Done.

Yes, every day. I have high uptime (some units on for 30+ days)
Unfortunately Kodi doesn’t handle shares going away gracefully; so if the NFS connection drops, Kodi will freeze for a period of time.

So are you experiencing iperf issues with other devices on the network too?
There were some hardware issues with some initial Vero 4K + devices which caused poor performance in the TX direction only. Usually these users would not be able to get speeds above 2-3Mbps. The RX direction is never affected.

Sam

Well, I don’t think so… There are many devices running against this NFS server, and a few of them are streaming videos all the time. I don’t think the retries are the issue.

I did another test: I removed the vero4k+ from its position and attached it directly to the main switch with a new cable. Same thing. Poor speeds with a few retries. The poor speeds appear in both directions. The receiving direction does not have any retries.

What makes me think this is a h/w issue, is the kernel log. Either the kernel driver is problematic, or this vero4k+ has a broken Ethernet.

Any other ideas?

I’m almost certain that this is anything but a hardware issue.

Of the few devices that had Ethernet problems, none had a problem with RX direction. If the hardware is faulty, we’d expect to see extremely poor TX performance (in the order of < 100Mbps) but perfect RX performance (900Mbps+).

  • What’s the MTU size on your network?
  • Do you see the skbuff BUG_ON without NFS mounted?
  • Do you see the skbuff BUG_ON with default rsize,wsize?

The retries look like a duplex mismatch to me; but I am not a networking expert.

Please also update to the latest kernel, which as some Ethernet improvements I plan to release shortly.

  1. Login via the command line
  2. Edit the file /etc/apt/sources.list
  3. Add the following line: deb http://apt.osmc.tv stretch-devel main
  4. Run the following commands to update: sudo apt-get update && sudo apt-get dist-upgrade && reboot
  5. Your system should have have received the update.

I also recommend you edit /etc/apt/sources.list again and remove the line that you added after updating. This will return you to the normal update channel.

Sam