SMB very slow on Vero 4K+

The Problem:

Last night, I tried playing a large (~50 GB) 4K video file and performance was unwatchable. It took forever to initially buffer and it stopped every few minutes to buffer. I had just watched a 25-30 GB file and it worked great, and my setup is pretty beefy, so I was surprised.

As part of testing whether it was a problem with the network, or the player itself, I transferred the file to a USB SSD drive mounted on my Macbook over the network via OSX’s native SMB implementation. 18 minutes to transfer the file, which by my math works out to 50 MB/s, so while there’s clearly a bottleneck somewhere, it should still be plenty to stream the file (18mins is much < than video length).

Setup:

Raspberry Pi 4 + WD Mybook over USB3

SMB mounted on the Vero 4K via fstab and CIFS
Config:

//<Local reserved IP>/galahad /mnt/Galahad cifs x-systemd.automount,rw,iocharset=utf8,,vers=3.0,username=<xxx>,password=<xxx>,noperm 0 0

Network (all Cat6 hardwired): Pi → Switch (office) → Switch (main hub in garage) → Switch (AV Rack) → Vero 4k+

Theoretically, it should be possible to get 100-125 MB/s over the network (even taking into account the HDD limitations, which are the biggest problem), which should be plenty to stream big ass 4K files. So, potential bottlenecks are network, disk i/o, and SMB.

Network

Quickly and easily ruled out using iPerf. Running between Pi ↔ OSMC I see:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   109 MBytes   913 Mbits/sec                  
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec                  
[  5]   2.00-3.00   sec   111 MBytes   934 Mbits/sec                  
[  5]   3.00-4.00   sec   112 MBytes   939 Mbits/sec                  
[  5]   4.00-5.00   sec   112 MBytes   939 Mbits/sec                  
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec                  
[  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec                  
[  5]   7.00-8.00   sec   112 MBytes   939 Mbits/sec                  
[  5]   8.00-9.00   sec   112 MBytes   939 Mbits/sec                  
[  5]   9.00-10.00  sec  86.9 MBytes   729 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec  1.07 GBytes   914 Mbits/sec    1             sender
[  5]   0.00-10.00  sec  1.07 GBytes   915 Mbits/sec                  receiver

That’s pretty much saturating the Pi and Vero’s gigabit network cards, so everything looks groovy.

Disk i/o on Pi

A little bit harder to hunt down. For context, the drive is an exFAT drive that was previously being used as USB drive on a mac, and I was too lazy to reformat. I was concerned that the exFAT FUSE drivers were causing issues, so I ran benchmarks with dd and hdparm

For more context, wirecutter’s benchmarks on the drive I own were around 140 MB/s read and write (I’ve seen other folks getting closer to 180 MB/s)

In my tests writes were pretty constant, and much slower than 140 MB/s (I do think the drivers are the culprit), but still should be plenty for streaming large files:

pi@raspberrypi:~ $ dd if=/dev/zero of=/mnt/galahad/test.tst bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 8.13792 s, 50.3 MB/s

Reads varied. When the disk had to wake up, we were looking at more like 18 MB/s (not surprising), but with a warm disk, things looked good (repeated all these tests many times):

Disk warm (OS cache blown away, disk still potentially cached):

 dd if=/mnt/galahad/test.tst of=/dev/null bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 2.41268 s, 170 MB/s

Disk cold (OS cache blown away, assuming disk cache dead):

 dd if=/mnt/galahad/test.tst of=/dev/null bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 22.4917 s, 18.2 MB/s

HDparm on /dev/sda1:

 Timing cached reads:   1584 MB in  2.00 seconds = 792.65 MB/sec
 Timing buffered disk reads: 586 MB in  3.01 seconds = 194.57 MB/sec

So, probably some performance gains to be had here, but nothing showstopping. Ruling this out.

SMB performance

So, on to SMB performance. To do this, I used dd again and just tried reading from the mount points to /dev/null. As noted to my mac in a very un-scientific benchmark I was seeing around 50 MB/s transferring the file.

To confirm this quickly I ran DD from my macbook:

dd if=/Volumes/galahad/test.tst of=/dev/null bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes transferred in 7.012072 secs (58413547 bytes/sec)

so ~58 MB/s

On to OSMC:

dd if=/mnt/Galahad/test.tst of=/dev/null bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 34.5151 s, 11.9 MB/s

So, this looks to be the culprit. SMB is very slow on OSMC.

From

A suggested fix is to enable loose caching (with some downsides, of course). I tried this, remounted, and…

dd if=/mnt/Galahad/test.tst of=/dev/null bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 28.1725 s, 14.5 MB/s

Nope.

Any ideas here? Since everything in my house is on Linux or Mac, I should probably just switch over to NFS (I could even run it side-by-side with SMB) but I figured I’d ask here in case anyone had any ideas or it could help improve something. Any thoughts?

Excellent write-up. Thanks for providing such detailed info.

I did notice this:

A 4K block size is very small. How does it perform with 1M and 4M?

Good question - apparently not a lot.

osmc@osmc:/mnt$ dd if=/mnt/Galahad/test.tst of=/dev/null bs=1M count=100000
390+1 records in
390+1 records out
409600000 bytes (410 MB, 391 MiB) copied, 24.6145 s, 16.6 MB/s

osmc@osmc:/mnt$ dd if=/mnt/Galahad/test.tst of=/dev/null bs=4M count=100000
97+1 records in
97+1 records out
409600000 bytes (410 MB, 391 MiB) copied, 26.3696 s, 15.5 MB/s

I actually noticed they were slightly faster without loose caching, up to a ‘blazing fast’ 17 MB/s :joy:

Worth knowing, nevertheless. I’m out of time for today but I’m sure others will be able to pitch in.

Couple of things to try:

  1. Run iperf for a while to make sure there isn’t any sporadic network gremlins (trying both directions)

    Upload - iperf3 -c <server> -P 3 -t 120 -O 2
    Download - iperf3 -c <server> -P 3 -t 120 -O 2 -R

  2. I only have 100mbit Vero 4k so haven’t maxed out 1Gbit network connection but all my devices I use the following mount options for SMB and not noticed any problems

    rw,username=XXXXXX,password=XXXXXXXX,nobrl,noexec,nosuid,nodev,uid=osmc,gid=osmc,iocharset=utf8,noserverino

Good suggestions. I tried iperf for a while, this network is super solid. I didn’t try messing around with mount options, I did, however, decide to set up NFS on my Pi. Holy guacamole:

dd if=/mnt/nfs_test/test.tst of=/dev/null bs=1M count=100000
390+1 records in
390+1 records out
409600000 bytes (410 MB, 391 MiB) copied, 3.97517 s, 103 MB/s

and, with my original block size for apples-to-apples

dd if=/mnt/nfs_test/test.tst of=/dev/null bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 3.98514 s, 103 MB/s

I’m going to switch over to NFS because I’m not sure how SMB could top this (I can always run them both side-by-side so other devices can connect), but I’m still a little weirded out over the performance difference.

For this one, I’d probably suggest chucking Wireshark between the devices to see if there are clues there.

I seem to recall a DNS bug in Debian Buster’s Samba a while ago, but the exact issue escapes me.

Sam

Ugh, spoke too soon. I think there was some OS-level caching going on with the NFS mount (I suspect this because with NFS and not SMB, subsequent dd runs get GB/s speeds, which obviously isn’t possible). I switched over my mount point to NFS from SMB (kept the old test mount as well), rebooted for good measure, and I’m at 13ish MB/s again.

Sigh. To the packets we go.

Keep an eye on ifconfig to see if a device is dropping down to 100Mbps

Oh no - palm, meet face.

So I looked at ifconfig quickly on osmc and saw both wlan and ethernet assigned IP addresses, which got my spidey senses tingling, but that went into alarm mode when I opened wireshark and saw SSH traffic for the wireless IP.

It looks like when both wlan and eth are active, for whatever reason the Vero actively prefers wireless (including osmc.local resolving to it). I shut off wireless from the gui, and just for completion sake, I mounted up both the new NFS and the SMB share and ran some benchmarks on some random files:

NFS

521+1 records in
521+1 records out
547049566 bytes (547 MB, 522 MiB) copied, 6.67231 s, 82.0 MB/s

1056+1 records in
1056+1 records out
1107725582 bytes (1.1 GB, 1.0 GiB) copied, 13.1634 s, 84.2 MB/s

SMB

536+1 records in
536+1 records out
562190197 bytes (562 MB, 536 MiB) copied, 13.8053 s, 40.7 MB/s

524+1 records in
524+1 records out
550339020 bytes (550 MB, 525 MiB) copied, 12.7441 s, 43.2 MB/s

That’s more like it! Both more than sufficient for my needs, verified that file playback is instant and smooth. I was surprised at how ‘sluggish’ SMB was compared to NFS, but not surprised enough to drag my laptop to the garage to get wireshark between the two to analyze packets - I’m just keeping NFS and calling it a day.

I was also surprised at how well wireless was handling pretty beefy 4K files, I might not have even noticed, although I imagine this will be a very nice performance boost.

2 Likes

If Ethernet is available, we’ll not bother with Wireless.
But if WiFi is connected then you add Ethernet, we’ll keep both technologies active.