SMB writes fast, but NFS writes almost a third the speed.

  • I'm having a nightmare of a time here. I have an ESXi 6.5 all-in-one with one VM for my media and one VM for my OMV. It's all running on a J3455B-ITX with 16 GB of RAM and four 3 TB WD Reds through an LSI 9211-8i. Since the LSI HBA (flashed to IT mode) is passed through to the OMV VM, I have 4 GB of RAM dedicated to it. Both VM's are set to use 4 CPUs of the quad core. The MTU is set to 9000 on every NIC in ESXi as well as in the interfaces settings for each VM OS (both Debian).


    The drives are setup using SnapRAID and mergerfs.


    The test is to copy a 1 GB file from my non-VM Windows 10 desktop workstation to my OMV VM. Another test is to copy a 1 GB file from my media box VM to my OMV VM.


    These are the baffling results:

    • SMB shares will write a 1 GB file at around 105-110 MB/s using Windows copy
    • SMB shares will read a 1 GB file at around 110-115 MB/s using Windows copy
    • NFS shares will write a 1 GB file at around 40-45 MB/s using dd
    • NFS shares will read a 1 GB file at around 120-125 MB/s using dd

    NFS server options are secure,async. I'm have tried sync,no_subtree_check,insecure,no_acl,no_root_squash,wdelay,crossmnt,fsid=1, and every combination thereof.


    The client is mounting as r/wsize=524288. I have tried vers=3, 4, and 4.1. I have tried r/wsize=65536 and 1048576,hard,intr,sync,actimeo=0,fsc,nosharecache,nolock,noatime,nodiratime, and every combination thereof.


    I have tried creating a separate port group for the OMV NIC. I've tried the E1000E instead of the VMXNET3 driver.


    CPU and RAM loads on OMV are minimal in all cases. They are not the bottleneck.


    I've run iperf tests and they look fine:


    Code
    ------------------------------------------------------------
    Client connecting to 192.168.1.13, TCP port 5001
    TCP window size: 1.84 MByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.11 port 55193 connected with 192.168.1.13 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  2.36 GBytes  2.03 Gbits/sec

    I just can't figure out why a separate machine on the network would have much faster writes over Samba than another VM in the same box over NFS. Note that the OMV NFS share isn't even mounted within ESXi yet; this is straight VM to VM. When I do mount the OMV NFS in ESXi, I get the same, slow write speed.


    I've also tried OMV 2.1.



    Any help would be greatly appreciated.

  • I'm working my way through everything I can find on troubleshooting performance and I think I might have found the culprit. After reading and writing a 1 GB file, the output of mountstats gives me this:



    Notice the lines:


    Code
    WRITE:
    	977 ops (49%) 	0 retrans (0%) 	0 major timeouts
    	avg bytes sent per op: 524177	avg bytes received per op: 136
    	backlog wait: 4610.141249 	RTT: 491.918117 	total execute time: 5102.354145 (milliseconds)

    A backlog wait of 4610 seems excessively high. I'm looking at ways to lower it, but I'm not finding much. I found this article, but I don't have a subscription.

  • I've made some progress based on the high backlog wait times. I found the following tunable:


    Code
    echo -e "options sunrpc tcp_slot_table_entries=128\noptions sunrpc udp_slot_table_entries=128"> /etc/modprobe.d/sunrpc.conf


    This increases my speed by almost 25%, bringing it up to around 55-60 MB/s. I'm not sure why this work, as according, to this article, these values (even udp?) are dynamically managed by the server. mountstats:


    Code
    WRITE:
    	977 ops (98%) 	0 retrans (0%) 	0 major timeouts
    	avg bytes sent per op: 524177	avg bytes received per op: 136
    	backlog wait: 3191.623337 	RTT: 321.162743 	total execute time: 3513.044012 (milliseconds)
  • I have noticed is that mergerfs CPU usage is around 50% during the NFS file copy and only 25% during the SMB file copy. During the SMB file copy, smbd is at about 25% as well, so I'm not sure if these numbers are a result of the CPU being virtualized within ESXi, or if NFS file copies are more mergerfs intensive.

  • One last post to close out the thread. If you are thinking of using SnapRAID as a data store for VM storage, don't. SnapRaid is for files that don't change much, and VM files will be changing all the time. I decided to go with ZFS instead. In doing so, I've been able to drop mergerfs as the zpool is now presented as once filesystem. In this configuration I get speeds of 140-150 MB/s over NFS. It's a much better solution. Cheers.

  • NFS is generally stateless so it's probably issuing many more commands which greatly increases the mergerfs overhead (since most of the overhead is in the non read & write commands. I'm unfamiliar with SMB but it sounds like it might be issuing fewer commands to the underlying FS which would lead to better overall performance.


    And yes, snapraid is for write once, read many kind of layouts. As is mergerfs.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!