OSMC may not be perfect. Let’s face it, nothing is.
But I’ll tell you what …
… I love a product where there’s actual effective, responsive, support!
Thank you, as always; also as always, at your convenience …
OSMC may not be perfect. Let’s face it, nothing is.
But I’ll tell you what …
… I love a product where there’s actual effective, responsive, support!
Thank you, as always; also as always, at your convenience …
This doesn’t sound like a libNFS (Kodi built in NFS) issue to me, as you are having the problem both with Kodi’s libnfs and Kernel mounts - I see a pause of about a second randomly a few times during a TV episode if I use the Kodi built in NFS client however I do not have any issues if I use a kernel NFS mount. My NFS server is just the standard built (bsd) nfsd on a Mac Mini.
Can you post your exact fstab line (with IP’s changed if you prefer) including all the mount options you are using ?
This is mine for comparison, which is working perfectly for me on my Vero 2:
192.168.0.10:/Users/admin/Movies /mnt/Mac-Mini nfs noatime,noauto,x-systemd.automount,async,nfsvers=3,rsize=8192,wsize=8192,nolock,nofail,local_lock=all,soft,retrans=2,tcp 0 0
Also it would be very helpful if you would post (via the Log Uploader) a system journal and Kodi debug log taken after the issue manifests, and point out the exact time when the problem occurs.
Some additional thoughts - is flow control enabled on the switch the Vero is connected to ? (It is for me, but some switches default this to off)
Have you run an iperf test between the Vero and NFS server ? Remember that you need to run the iperf server on the Vero and the client on the NFS server to test receive speed at the Vero.
What bitrate are these rips ? Are they full blu-ray rips or are they at a lower bitrate ? Do you know roughly at what bitrate you start to have problems ? A good way to test is using the “jellyfish” clips, available here:
Download the 10,20,30,40,50,60 Mbps versions of these and put them on your server, then attempt to play them - find the fastest one that plays right through without pausing or buffering. I would recommend testing with the h264 versions.
You should be able to play up to about 80Mbps over the 100Mbps connection, which is more than enough for a full rate blu-ray rip which is typically 50Mbps max.
Finally, have you tried copying the offending file(s) to a USB stick or SD card and playing it while directly plugged into the Vero to see whether it is in fact a network issue ?
Hello @DBMandrake, thank you for the extensive and helpful reply.
This doesn’t sound like a libNFS (Kodi built in NFS) issue to me, as you are having the problem both with Kodi’s libnfs and Kernel mounts …
Yes, that’s a very good point.
Can you post your exact fstab line …
gateway.firstgrade.co.uk:/home/Multimedia /home/Multimedia nfs rw,proto=tcp,sec=sys,hard,intr,bg,noauto,x-systemd.automount 0 0 gateway.firstgrade.co.uk:/home/dmz /home/dmz nfs rw,proto=tcp,sec=sys,hard,intr,bg,noauto,x-systemd.automount 0 0
I always try to keep the NFS options used to an absolute minimum, and let the NFS protocol sort out the rest. That doesn’t always happen though, so there is definitely scope for further “tweaking” (on the client end); in particular, I see you’re specifying rsize/wsize, and effectively completely disabling NFS locking; you’re also forcing NFSv3 (which will disable a number of the otherwise automatic protocol negotiations that come with NFSv4) so that’s interesting too.
I’ll experiment with some of the options you’re using and see what (if any) difference it makes - thanks.
Also it would be very helpful if you would post (via the Log Uploader) a system journal and Kodi debug log taken after the issue manifests, and point out the exact time when the problem occurs.
Certainly - will do.
Can’t at the moment as the family is watching “The Mask” (which is playing apparently perfectly ) - I’ll see what I can arrange, hopefully tomorrow.
Some additional thoughts - is flow control enabled on the switch the Vero is connected to …
Oooooooh yes …
I always check that (good point though), and will typically disable autoneg too (as it causes vastly more problems than it ever solves) though I haven’t done that for the Vero yet (I will try).
Have you run an iperf test between the Vero and NFS server ?
No, and that’s another good suggestion - I’ll do so.
What bitrate are these rips ? Are they full blu-ray rips or are they at a lower bitrate ? …
Full rate Blu-Ray rips. We get “AnyDVD HD” to strip the protection, but otherwise it’s a pure image rip. The “Prometheus” 3D rip in particular is reported as generally around the 25-30Mb/s mark if I randomly check during playback.
As for the problems starting at any particular bitrate, that is another very good question. E.g., “The Mask” is currently playing perfectly (and reports ~4.5Mb/s at the moment). Interestingly, it’s also telling me that, about half way through, we’ve got nearly 5,000 drops and 3,500 skips, so whilst visually playing fine, it may well be having difficulty even at that relatively low bitrate.
I’ll download the jellyfish clips and (using the H264 versions) check to see at what point the playback problems become obvious.
Many thanks, you’ve given me quite a number of interesting possibilities to pursue - I’ll get back to you.
I have a feeling this may be a long reply, so maybe grab a first
Before I go on, I should point out that although I tried the test kernel you mention in the Jerky playback and no sound on some videos thread, I reverted to the GA kernel ("…-27-osmc", all updates applied including the ones which seem to have appeared today) and ran
apt-get autoremove before any of these tests.
First things first - is it the network?
I copied “Prometheus” (3D Blu-Ray full-rate ISO rip; protection removed) onto an SD card (“Samsung Memory 64 GB Evo MicroSDXC UHS-I Grade 1 Class 10” to be exact, so with a manufacturer’s quoted read speed of 43Mb/s that ought to be fine with the ~25-30Mb/s I see if I check the info. randomly while playing the title - oh, and it’s a genuine card: Samsung lead the world in “Frustration free”-free packaging, and this took about 5 minutes to get into with bolt cutters and a blow-torch - it’s genuine …).
You can find the “MediaInfo” on the “main” title file (“BDMV/STREAM/00916.m2ts”) here, but so far as we’re probably concerned, the headline figure is “Maximum Overall bit rate: 40.1Mbps” - as it’s a genuine Samsung UHS-I card (snigger … bolt cutters …) even that should be well within tolerance.
Oh, and probably irrelevant, but I used it as-formatted from Samsung (single “exFAT” partition); no hookey “Optimise your flash!” type partitioning or formatting tools.
So what happened?
Well, started playing in seconds, not minutes; I’m currently about 10 minutes in and there has not been a single pause, skip, drop-out - video or audio. Perfect playback, and if I look at the info. I see 1 drop, 0 skips.
So, I could be wrong, but I’m fairly convinced we’re having issues here getting data in over the network …
Yes, it is the network …
I was very interested in the differences between our NFS configurations. So that’s the next rock to turn over. I’ll ignore
libNFS and just stick with the kernel implementation, so I’ve got our “/home/Multimedia/” share (and “/home/dmz/”) mounted on “/home/Multimedia/”; “/etc/fstab” lines as above.
Now here’s a funny story …
I originally added the “Multimedia” share as an “nfs://…” source. Subsequently, I added “/home/Multimedia/” as a “local” source.
I can’t find any way to delete the “nfs://…” source via the GUI, and both obviously show up as being called “Multimedia”. So I tried pretty hard to make sure I was selecting the correct one.
(you can see where this is going, can’t you? …).
To make absolutely sure, I manually edited “sources.xml” and removed the “Multimedia”
<source> entry from the “Video” block and replaced it with the
<source> entry for “Multimedia” from the “Files” block.
Et voila …
So thank you for all your thoughts and hard work offering suggestions but it seems that ultimately @sam_nazarko was right (yes, you do have
libNFS problems! ) and you, @DBMandrake, have kindly spent your time helping out a PEBKAC issue …
I am so sorry about that …
So to be clear, are you saying the full bitrate blu-ray rips are now playing OK for you using kernel mode NFS, and you accidentally were actually using libnfs all the time before ?
And they are playing OK for you with your original nfs fstab entry which has less options than mine ?
Don’t be sorry - your testing has been invaluable, and if the above is correct you are seeing the same issue as me, except because I am playing much lower bitrate material all I am seeing is the occasional pause lasting about a second during playback.
Kodi’s libnfs client has basically no buffering, no read ahead, and I believe does synchronous transfers with relatively small buffers, all of which don’t lead to optimal performance. Any brief interruption to the smooth flow of data across the network would lead to a visible pause.
If you’re interested in the more techy side of this there is a discussion from last year on a Kodi Git issue:
It may be that there is an underlying issue with the Ethernet driver at the moment causing brief pauses that is exposed by the lack of buffering and readahead, we’re still testing to try to determine whether there is an underling network issue or whether it is purely a Kodi libnfs issue, or possibly a bit of both.
Another thing you could try that might be interesting, would be to go back to the Kodi libnfs (nfs://) method, but enable buffermode 1 in advancedsettings.xml:
<advancedsettings> <network> <readbufferfactor>4.0</readbufferfactor> <buffermode>1</buffermode> <cachemembuffersize>20971520</cachemembuffersize> </network> </advancedsettings>
By default Kodi does not do any buffering of “local” file systems, and an NFS mount using libnfs (or a kernel mount for that matter) both count as local file systems. The kernel mount does its own readahead and buffering but the libnfs mount does not.
Setting buffermode to 1 tells Kodi to use a buffer for these “local” file systems as well as internet streams. I’d be interested to know whether this also solves/hides the symptoms you were seeing.
exFAT goes through FUSE, and is still a bit problematic on Linux, so achieving good playback from an exFAT formatted SD card is good to hear.
I spoke with @DBMandrake last night and we have ruled out a possible IRQ storm (we tried moving the interrupt to a different core, but symptoms remained).
I’ve made some changes to Device Tree and will have a new kernel for the morning. I have also bumped the libNFS version in our staging repository so I can get that to you shortly. If this still fails to work then I will set up an NFS server on my desktop and use Wireshark to see if I can identify any issues with the Ethernet module.
Seriously?! Honestly, I’ve not dug that far into Linux kernels to know what is handled where, but what is this Linux obsession with userland code that should be in the kernel?!
2 words - “Buffer cache” … Kernels are there to do that sort of thing; it’s their job, all day, every day. Don’t try and put the poor things out of work …
Again, I feel I should apologise. There’s so little thought, and so much assumption, to get where we are in this discussion (that’s “so little thought …” on my part, not yours).
When I added my initial NFS URI, I didn’t give it a moment’s thought - just assumed it would work. It did.
Had I have given it a moment’s thought, I wouldn’t even have considered it being a userland NFS implementation. Yes, I’m aware such things exist (for starters, that’s exactly what the automounter is, but that’s for different reasons …) but … Since I didn’t see the filesystems mounted, I assumed Kodi must be running around in the background, sacrificing chickens (Tofu ones, of course …) and casting magic incantations over the entrails (still talkin’ Tofu here …) to get the automounter to handle it on demand. I prefer a little bit more control over my NFS options than that, so I added the manual mounts to
fstab (then added the path to Kodi).
Didn’t know Kodi was using userland NFS until you mentioned it earlier in the thread …
But yes, if “exFAT” goes through FUSE, it’s almost miraculous it worked
There I’m no help at all, I’m afraid, as I don’t seem to be seeing those symptoms. Now I’m using a proper NFS implementation to access the shares, I’m seeing no apparent latency. If I play “Prometheus” and have the status bar showing, I don’t see any particular disparity between CPU usage; if there were any pattern there, I’d say CPU4 tended to have the highest workload, followed by CPU2 - that’s pretty consistent, but it’s a marginal difference (and checked by “Mk. I Eyeball” rather than anything more specific … or precise!).
OK thanks - I’ll make sure I run the tests @DBMandrake requested before allowing any updates; that way I will hopefully be able to give you a before and after comparison:
Yes, absolutely - of course. I’ll try that shortly; then I’ll check for any updates later, apply any you might have made available, then re-test to see whether there’s any difference.
So to be clear, I am saying that when the Muppet user (with apologies to the late, great Jim Henson …) actually uses the correct media source (kernel mounted NFS rather than yet-another-userland-filesystem …) then, were the viewer possessed of a poetic soul, they might be moved to tears by the sheer beauty of the exquisitely rendered desolate landscape at the start of “Prometheus” … so that’s a “yes”
Perfect playback; started in seconds (10, to be precise - c.f. 2-3 minutes when using
libNFS); not a single noticeable drop, glitch, stutter or artifact; after 2+ hours of playback, the stats. still said 0 drops, 0 skips; …
Thanks also, @DBMandrake, for the technical background - I’ll take a look through that when I have some “spare” (ha!) time. But honestly, unless it’s useful to you for diagnostics (such as the test you asked me to run), now I know the source of the problem I’m personally entirely satisfied with the solution - don’t use
libNFS! Though I appreciate you’ll need to come up with a more generally usable solution than casting magic runes at
Oh, which reminds me - NFS options … I’ll come back to you later about those …
Again gentlemen, thank you so much for such quick, clear, helpful and effective support - superb.
Sorry, I still haven’t had the time today to sit down and do some proper tests - hopefully very shortly …
Meanwhile - NFS …
@DBMandrake, I’m curious - how did you arrive at that combination of options? I’m wondering whether that’s a carefully considered selection based on extensive NFS protocol level knowledge, or put together from Internet “How to tune your NFS!” articles (not that either source is a problem)?
“You can tune a piano but you can’t tuna fish”.
[[Bonus point: Bonus points if you’re old, or GG (Generation Google), enough to know a) It’s the title of an album by REO Speedwagon; more importantly, b) why I apparently randomly quoted it here ]]
I ask (NFS options, not pianos or tuna …) because I see these sort of options crop up all over the place, and they’re almost invariably re-hashes of older NFS articles going back to the days when stone axes were the latest thing in super-weapons.
So I’m genuinely interested - you may very well know more about what works under these circumstances than I, so here are some thoughts, and any comments you might have will doubtless be very interesting.
You’re specifying read and write buffer sizes of 8K.
In the dawn of pre-history, NFS hurled 512 byte chunks of data around. Fits into a UDP packet, same size as a typical sector, people didn’t (usually …) transfer massive files over NFS, so it was a relatively efficient size.
If you needed to give NFS a hint about streaming data, you might try and tell the server to chuck out a bit more at once -
wsize affect the buffer size negotiation within the NFS protocol (more precisely, they set an upper limit on the transfer sizes, but …).
So “increase your ‘rsize’ and ‘wsize’!” used to be about the first words in these tuning guides.
However things have, mercifully, moved on a bit. “Advanced Format” drives have (at least) a 4K sector size, not 512 bytes; kernels have vastly more buffer space; … So you’ll probably never see an NFS server negotiating anything less than 32K these days.
That’s all very well in theory - let’s see what’s happening in practice:
osmc@Arthur:~$ nfsstat -m
/home/Multimedia from gateway.firstgrade.co.uk:/home/Multimedia
Flags: rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=18.104.22.168,local_lock=none, addr=22.214.171.124
I’m sure you recognise that magic number
So the server has offered 1Mb; the kernel client on the OSMC box has said “Yup, I’ll have that please …”. You can see the effect specifying an 8K buffer size is going to have (save the mental arithmetic - 128 transfers for each 1 that would otherwise be made).
However that’s from my NFS server - YMMV.
sync v.s. async:
Didn’t even think that was available as an option any more Async is what you want (as you’re using) but it’s probably not going to make any difference as you’re not (I assume?) going to be writing much over the share.
locking v.s. no locking v.s. local locking v.s. …:
Yup, turning off the locking and faking it locally - no problem there. Again though I’m interested in why you think it’s necessary; if I look on our server (and remembering “Arthur”, the OSMC box, has negotiated NFSv4):
% nfsstat -s
So it’s never actually requested a lock. Ever.
(same thing, incidentally, if I run
nfsstat -c on the OSMC box - zero locking activity).
OK, now to the interesting ones …
soft and tcp:
That’s an interesting combination.
I’d have thought you’d be using hard mounts, as they behave closest to a disc. Soft mounts are really a throwback to the days of NFSv1 and Sun Microsystems’ discless work-stations - they’re very much the NFS equivalent of UDP (“Yeah, it might get there …”) and are effectively completely stateless.
To that, you’re adding a degree of persistent state, by way of a TCP connection.
Again, I’m sure you’re going to have good reasons for that combination - put me out of my misery and share them, would ya?
Now all that is interesting, and a nice technical discussion (which is why I’ve gone into a bit more detail than I’d otherwise need to - in case anybody else is interested in some of the details).
Unfortunately, none of that helps at all with
OK, so now I’ve been able to grab back the “Vero 2” for long enough to run some tests!
OSMC not updated today (though I see an update is now available).
libNFS access to the “/home/Multimedia/” share.
I think this isn’t a fair test - having just (re-)added the “nfs://” URI, the bottom of the screen is telling me “Scanning movies using The Movie Database” …, so I suspect that’s knackering the performance while it re-scans … quite a lot … of media files …
I don’t want to perform any “abnormal” operation that might skew the results, so I’ll leave this post here for the moment and re-run the baseline (then run the other) tests tomorrow and edit this accordingly.
[[Edit: And so we continue after the media library re-scan has completed]].
Test 1 - Setting
buffermode to 1:
mediacenter was stopped (
sudo systemctl stop mediacenter),
buffermode changed (by creating
userdata, containing the information you provided; owner and group were then changed to
mediacenter re-started (
sudo systemctl start mediacenter). The system was left for 5 minutes in case there were any initial background tasks the startup may have kicked off. The test ISO was then played.
So a slight improvement, but might just be subjective.
Test 3 - Resetting
buffermode; updating OSMC (and rebooting …):
advancedsettings.xml file was renamed
_advancedsettings.xml, the system updated (applying updates using the GUI) and restarted. The test ISO was then played.
(I.e., Baseline: but after updating)
Test 4 - Setting
buffermode to 1 (still on updated OSMC):
Both results were the same as before applying updates.
So not a lot of difference really
The mount options are actually based on some optimisation I did on the Raspberry Pi 1 in Raspbmc a couple of years ago, although in that case I was specifying UDP as the Pi 1 really struggled with TCP and using UDP was the only way to get fast enough performance to stream a full bit-rate UDP rip.
I did not think too much about it and simply copy/pasted it to the Vero 2 as a quick test, only changing udp to tcp (to give a fair comparison to libnfs, which only uses tcp) and added the systemd mount options.
I haven’t checked on the Vero 2 but way back when I did the Pi 1 nfs optimisation (unfortunately the Raspbmc forums are now gone, my nfs thread along with it) the default rsize and wsize are actually 1024 bytes, at least for UDP. I don’t remember what it might be for TCP.
Keep in mind that with large rsize and wsize on UDP that you are essentially creating a large fragmented packet that is sent as many IP fragments as each packet can only hold a payload of about 1472 bytes on an Ethernet with 1500 MTU. So even an rsize/wsize of 8192 is many IP fragments.
I found on the Pi 1 with UDP was the point of diminishing returns - beyond that you just risk increasing packet loss due to lost fragments.
I didn’t bother to change the rsize and wsize when switching UDP to TCP for the Vero 2 testing, it may well have benefited from a larger size. (Remember it was a quick test)
I think you misunderstand what soft does, it doesn’t affect the network protocol at all, it affects how the client behaves if the server becomes unreachable for any length of time. From the man page:
The problem with ‘hard’ mounts for a media share is that if the NFS server or the connection to it disappears, any application (such as Kodi) will hang forever with a completely unresponsive user interface until the server returns. Which might be never.
You will not even be able to figure out what went wrong because Kodi is completely frozen. With the ‘soft’ mount and an appropritate retrans value, after a certain number of retransmissions have been sent with no response the nfs client will return an error to the calling application instead of blocking it forever.
Thus Kodi would freeze for a while, (about 30-60 seconds with retrans=2) but would then become responsive again and report that an error occurred trying to open/read the file. In my opinion this behaviour on a media share is much preferable than a permanent freeze.
For a root file system mount you would want to use a ‘hard’ mount not ‘soft’, yes, but not for a media share which which is only providing video files for you to play.
Thanks. It does indeed seem that libnfs is doing a really poor job. On the test Jellyfish clips I have, kernel nfs mounts will play up to 80Mbps without any pauses or stuttering at all.
With libnfs I can only play up to about 40-50Mbps and still see the occasional random brief pause. Whilst libnfs is never going to be as fast as a kernel nfs mount, we don’t believe that it is performing as well as it should be at the moment, investigation is ongoing.
There might be some regression in libnfs performance in Jarvis, unfortunately there was never an Isengard release for the Vero 2 so we can’t go back to compare that.
In the meantime our recommendation would be to use kernel nfs mounts - yes it’s a little bit more work to set up but it is always going to be the fastest and most reliable method of streaming from a NAS/Server.
Yes absolutely - I obviously wasn’t explaining the comparison clearly.
It’s not just about the protocol, it’s about how it’s used and what the effects of that will be. Probably shouldn’t have mentioned UDP as a comparison, but it is a relatively accurate one.
Consider: You’re a developer. Assuming you use (for example) C, when was the last time you checked the success or failure of (again, for example) a “
close()” in your code?
From Sun (Soracle, Oracle, Larry’s-Private-Bank-Account, call 'em what you will), who are after all the people who invented the protocol in the first place ():
Applications frequently do not check return values from soft-mounted file systems, which can make the application fail or can lead to corrupted files. If the application does check the return values, routing problems and other conditions can still confuse the application or lead to file corruption if the soft option is used.
I.e., they’re very much the NFS equivalent of UDP (“Yeah, it might get there …”) in practical use in pretty much any application. E.g., edit a file on a soft-mounted filesystem, (figuratively) pull the plug on the NFS server, write-and-quit your editor - your data are gone and you probably won’t even get a message to that effect. With a hard mount, you will know about the problem …
Also, remember that NFS (up to v3) is an intentionally stateless filesystem. When using UDP as the transport, that lack of state persists from application to server; end-to-end, if you’re using soft mounts (no guarantee an operation will complete).
Now, if you then use TCP as the transport, you are imposing an independent layer of state into that otherwise stateless system. With UDP, you’d send packets up to the retry count, then (in absence of a response) consider the data lost; with TCP, you have a direct indication of connection state and the mechanics of handling the retransmissions can be somewhat different. I’ve even seen NFS client/server combinations where, if TCP is used as the transport, any “soft” option is completely ignored, and the effective semantics are that of a hard mount.
Now, you then go on to talk about the behaviour, with Kodi hanging etc.
That, I’ll freely admit, I hadn’t considered at all. I’m not used to working with NFS shares which might not be there
Plus, of course, the only thing being written to the share (where you really want hard mounts - writing …) would be updated access times (if the mount options permit it) …
In my opinion this behaviour on a media share is much preferable than a permanent freeze.
In the light of your explanation, yes, I’d totally agree.
See? Said you’d have a good reason - thanks for sharing it
Seriously?! Good grief …
I guess someone must severely have trimmed that down for the “rPi” port - the normal Debian default is 256K for both, IIRC.
Yes, but sorry - that’s completely irrelevant
If you have, say 50Gb of data to transfer, that will always equate to a minimum of X packets on the wire, where X is 50Gb / payload size per packet. So that number of packets, X (minumum), is always going to be on the wire regardless of whether you’re using NFS with 8K sizes, NFS with 1M sizes, or just dumping it raw into a socket. Makes no difference.
What does make a difference is how much you’re adding to X by way of protocol overhead. That’s what larger read/write buffer sizes are giving you - a direct reduction in that protocol (NFS) overhead.
The main thing is that “you” (as in “The User”) can never know as much as the kernel about what it can do best. The read and write buffer sizes are negotiated mutually between client and server as to what should work best for that combination under these circumstances. Well, assuming the porting team did their job right anyway It can need tuning where client, server or both are very short on resources (still chucklin’ at your “1024 bytes” ) and you’re desperately trying to get that last bps of bandwidth, but otherwise the NFS implementations should negotiate the mutually optimal sizes all by themselves.
But, as always, YMMV
That’s not actually true, for two reasons.
One is that if you use excessively large rsize and wsize with UDP you end up with a rather large number of IP fragments - if even one fragment is lost due to packet loss the UDP packet can’t be reassembled and the whole lot needs to be sent again. The more fragments you send per packet the higher the chance of packet loss hitting one of the fragments, and the more data you have to go back and resend.
If you have zero packet loss it doesn’t matter, but if you have a small amount of packet loss (such as a wireless network) then performance can suffer greatly with an excessive number of fragments. Therefore for UDP wsize and rsize should not be set any bigger than the point of diminishing returns.
The second reason is rsize and wsize in UDP mode is the effective window size for nfs3, and just like TCP window size, if its too small to satisfy the bandwidth/delay product performance will suffer.
There is certainly some benefit to reducing protocol overhead but IMO the window size effect is more important - if the rsize / wsize is too small for the round trip time you get a lot of “dead time” where no data is not being sent.[quote=“THEM, post:15, topic:14076”]
The main thing is that “you” (as in “The User”) can never know as much as the kernel about what it can do best. The read and write buffer sizes are negotiated mutually between client and server as to what should work best for that combination under these circumstances. Well, assuming the porting team did their job right anyway
I can’t agree there - manual tuning of network parameters to suit a specific network and device situation is always going to trump “defaults” in the protocol and/or kernel.
I can assure you that whatever the default was when I was doing the Raspberry Pi 1 testing, it was far from optimal, with throughput less than half of what I ended up with after I had tuned rsize and wsize. For that specific test situation 8192 was optimal in the sense that going higher only gave very tiny incremental increases in throughput at the expense of much greater risk of performance loss in the face of packet loss.
I am in no way suggesting that 8192 is a magic number for any other situation. Part of the optimisation on the Pi 1 was really to work around the very slow CPU - it is actually slow enough that the overheads of TCP itself when trying to play back video was enough to slow things down dramatically. Most of the performance benefit came from switching to UDP, which is not the case on devices with much more powerful CPU’s.
Oh absolutely, but are you really seeing fragmentation on UDP streaming data? Because you shouldn’t be - the kernel will package up your data as efficiently as possible, and it has plenty of scope to do that here. It’s not as though you’re working on a terminal app. and bumping into Nagel every packet …
Again, absolutely … but I’d assert that 8K would potentially be
to small to satisfy the bandwidth. Just because someone on the “rPi” porting team decided 1K would be an appropriate default doesn’t mean it is - there I’d almost disagree with my own earlier statement about letting the kernel know best because I suspect the porting team will just have minimised everything on the assumption no-one is going to use it for serious network throughput.
I wonder whether anyone has increased that default (and any similar ones) for later, more powerful, hardware (esp. the “rPi 3”), or whether it’s still just set to the absolute minimum possible? Interesting …
Ah, here is where we see a difference in our backgrounds
No, the kernel should always trump hand-tuned values in the (vast) majority of situations. But I would have to qualify that with a big if the porting team …
For a start, there’s quite a bit of code in the average kernel that is there specifically to tune dynamically. Variations in size (availability) of the buffer cache, read-ahead algorithms, dynamic TCP tuning (E.g. dynamic window sizing), … - all account for the instant-by-instant assessment of what will, and will not, work well, on that system, under that load, at that instant. It is simply not possible to come anywhere close to optimal performance manually, as you’re having to provide “average case” values for a (potentially highly) dynamic system. Plus, by manual tuning, you’re always running the risk of adversely impacting some other part of the system - one of my favourites (not NFS related) is when I see people having read that letting the kernel buffer things up for you is a good thing, so manually tune their buffer cache size ("'Coz that’s wut it sed in wot I red on The Internet …") to the point where they’ve got no memory left in the system for pretty much anything else (and I’ve seen qualified sysadmins, who really should know better, make mistakes like that)!
And therein lies the problem - “whatever the default” = “… if the porting team …”
Actually, again - difference in backgrounds - it’s probably not fair to criticise the porting team for that. They’ll have been making a general purpose port for a general purpose device, and particularly with the original "rPi"s I can’t imagine anything requiring high network filesystem performance being on their list of likely target use cases …
But still, that’s where the tuning should be done; inside the parts of the kernel specifically designed to do so.
IMHO - that’s given my background. Yours obviously involves trying to get the last possible erg of computing power out of a box with (very) limited resources. Different approach …
But that’s the funny thing - it is a magic number .
8K is the maximum supported block size for NFSv2 (32K for NFSv3, which was why I said earlier in the conversation you’re unlikely to find a modern NFS server negotiating anything less). That’s why it crops up so many times in these Internet “Tune your NFS!” guides (which are mostly recycled from very much older documents) …
You can set the sizes higher, but if you’re using NFSv2 that is the fundamental block size that NFS will be using. The kernel probably should ignore larger sizes anyway; some don’t, and try to use what’s been specified to aggregate operations (may or may not be successful; may well be counter-productive …).
Good grief, it’s really that slow?! Wow …
Then I applaud your efforts to get anything like XBMC/OSMC running on it!
I seriously didn’t realise it was that slow, so that using UDP actually made a difference. That explains your preference for that over TCP; personally, I can’t remember the last NIC I used (other, obviously, than on consumer items) where the CPUs were even bothered with tediousness like running an IP stack .
Wonder what the UDP/TCP trade-off is like on the more modern hardware (E.g., “Vero 2”)? You were basically having to work around some of the inequities of UDP (effectively trying to tune it into almost behaving like TCP in terms of “managing” the network load), because you couldn’t let the CPU do TCP for you. Hm. So:
Yes, if that’s how you’re having to drive the networking, then that is going to make a difference (more so than additional NFS protocol overhead), and because you’re not dynamically adjusting it to instantaneous load, you’re going to have to use “average” values.
You really shouldn’t be having to do the kernel’s work for it like that (I’m not saying you didn’t have to, I’m saying you shouldn’t have to) … You’re trying to fake TCP-like behaviour out of UDP by tuning what is read and written when to maximise utilisation of the wire and minimise retransmissions - that’s all the job of TCP.
Just out of interest, were you in a position to change the application (specifically whatever was playing the content), or just tune the system to work best with it? If the former, I’m wondering whether it used direct I/O, as, if not, that would probably have made your life significantly easier by freeing up (for example) 25-30Mb/s of bus bandwidth (not to mention approx. halving the memory used for buffering). Plus CPU, if kernel/user copies are done using something like Duff’s Device (I’d hope the “rPi” would use the DMAC, CAMMU or possibly even the GPU to do that sort of thing - even if it does though, you’re still going to be using bus bandwidth which would much better be used elsewhere) - for this sort of application, quite a lot of CPU probably …
Thanks - interesting discussion
I’m pretty sure the default nfs buffer sizes are just what the linux kernel/debian has chosen.
No one at RPi set the defaults to 1K.
If you set wsize and rsize to 8192 over NFS3 using UDP you will absolutely, 100% guaranteed be generating fragmented UDP packets… (6 fragments, to be exact) The wsize parameter when used with UDP specifies the payload size per UDP packet. If it is bigger than 1472 bytes you will get fragmentation. (No jumbo frame support on a Pi…)
Fire up wireshark and take a look.
NFS over TCP is completely different because TCP allows a larger payload to be split over multiple packets instead of causing IP fragmentation. So large wsize/rsize is not a problem with TCP.
I tested 1K, 2K, 4K, 8K, 16K, 32K and 64K using UDP on the Pi 1. Performance increased all the way up to 8K but levelled out beyond this, so I chose 8K in that instance not because some guides might say it, but because it gave the best performance without going past the point of diminishing returns.
As I said earlier, I’m not suggesting that this is the optimal value for other devices, nor am I suggesting it is optimal for TCP - it is is not.
Don’t confuse hardware accelerated checksum and other functions that modern networks cards do in hardware with the actual TCP stack itself. Put a standard network card in any Linux box and it is still the kernel that is generating TCP acks, and that takes CPU power to do so.
The network controller and USB hub in the Pi is very crude and the Pi 1 CPU is painfully slow, so yes, the extra work of having to generate TCP acks was enough to drop maximum streamable bitrate (while actually playing video) from about 50Mbps down to 30Mbps or so. It’s not just the CPU time you have to consider, but also bus contention on transfers to the USB controller and so on. It all adds up.
A Pi 2 can easily stream >50Mbps during playback using NFS over TCP and the only difference is the CPU core - they have the same USB controller, Network controller and GPU. The USB controller is still not very efficient but there is CPU power to spare in servicing the controller.
Interesting - thanks very much for the clarification.
So I wonder why @DBMandrake was seeing that? The Debian defaults are, IIRC, 256K for both (E.g., that’s what you’ll see with a Debian NFS share mounted on a Debian NFS box, with no
wsize specifications or anything else affecting the choice).
I think you’re both getting TCP and UDP confused - the default rsize and wsize are different for TCP and UDP.
Yes, a “standard” network card in any “Linux” box
As I said, slightly different backgrounds . Last system software Unix work I did was some porting to a Fujitsu box running UXP/M - liquid nitrogen cooled everything, 256bit wide data bus, … NIC hardware acceleration was not limited to calculating a few checksums .
No, not getting them confused …
Yes, the sizes are different. TCP, the Debian default is 1M (NFSv4); UDP, it’s 32K (maximum allowed under NFSv3; this doesn’t apply to NFSv4 as that only supports TCP). I thought it was something like 256K for NFSv4, but it’s not; confirming by an actual test - E.g.:
root@Tip:~# mkdir /var/tmp/foo
root@Tip:~# touch /var/tmp/foo/bar
root@Tip:~# echo '/var/tmp/foo (rw)’ >> /etc/exports
root@Tip:~# exportfs -av
exportfs: /etc/exports : Neither ‘subtree_check’ or ‘no_subtree_check’ specified for export ":/var/tmp/foo".
Assuming default behaviour (‘no_subtree_check’).
NOTE: this default has changed since nfs-utils version 1.0.x
root@Tip:~# mount -t nfs -o ro,proto=udp,nfsvers=3 tip:/var/tmp/foo /mnt
root@Tip:~# nfsstat -m
/mnt from tip:/var/tmp/foo
root@Tip:~# uname -a
Linux Tip 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux
(see earlier in the thread for defaults using NFSv4).
That’s exactly what I’d expect to see and is in line with (up to date ) NFS documentation.
@popcornmix has said no-one at “rPi” would have changed it to 1K (and as I mentioned earlier, I wouldn’t hold it against anyone had they have done anyway - general purpose port, general purpose hardware, …).
I’m not disputing your having seen 1K as a default - I’m sure you’re perfectly capable of establishing these things. I’m just curious as to why you were seeing a 1K default when that isn’t the Debian default (for TCP or UDP based NFS of any sensible version) and it wasn’t changed by the porting team.
Just curiosity, that’s all.