MergerFS + SMB/FTP slow READ speeds good write speeds

  • I know the topic of MergerFS and SMB Performance is not new. Similiar to this topic but the opposite direction.
    But after reading all threads about possible issues and trying pretty much everything i want to see if anyone else has ideas.


    Problem: Reads via SMB and FTP are slow (around half what they should be) while write speeds are fine and dd tests of the mergerfs array are also fine.


    The server is connected with a 10G link (Mellanox ConnectX-3 both ends) directly to my computer and with a 1G link to the rest of the network and running OMV 4 with latest updates.
    The configuration is a standard snapraid+mergerfs pool like many users are running.


    Iperf tests result in pretty much 9.9Gbits bidirectional speeds with sizes up to the MTU of 9000. Should be no issue here.
    There are hard drives and an ssd in the server. It has 16GB of ram and an 8 core fx-8350 and a lsi 9211-8i for the drives.
    All drives are EXT4 formatted.
    The drives read and write large files without problems with around 150-200MB/s when tested with dd or accessed with SMB directly without MergerFS.
    Even when accessed via the fuse mergerfs mount they are pretty much as fast as directly. Also fine.
    All SMB/FTP Read/Write tests are done on the client into a ramdisk. No bottlenecks here.


    Now the issue comes when mergerfs combined with any remote access.
    Folders sitting on the slower drive which still makes 150-200MB/s when accessed directly on the server only reads with 50MB/s via SMB.
    A larger and faster 10TB drive which is a bit faster still reads at only around 100MB/s but sometimes writes with up to 200MB/s continously even very large files and should read sequential at these speeds too.
    Even with SMB overheads it should be faster than that. The same behaviour can be observed via FTP with the same speeds and in Windows and Linux clients.



    Not great but still much faster than the ~50MB/s i get via network. Writing has no issues getting close to these speeds.
    But these tests at least show that MergerFS causes a slight slowdown when reading and makes no difference when writing.


    I had observed higher speeds than that before... Snapraid Syncs run at up to 250MB/s. Writing large files via SMB can sustain reasonable 180MB/s after the cache is full which is observable in iotop.
    But reading it back stays below 50MB/s.



    The same speeds as with dd can be achieved when the drives are accessed via SMB directly. But with MergerFS it drops way down and never gets over 50MB/s when reading.
    No slowdowns when reading from multiple drives via MergerFS at the same time. All get around 50MB/s.


    Without direct_io it drops even lower to 30-50MB/s on the faster drive.
    With direct_io in MergerFS the faster hdd reads at up to 100MB/s with occasional drops which feels at least closer to where it should be but the other stays below 50.
    So it does seem to make a difference.


    iostat when reading from sdb (the "smaller" drive which is affected worse) via SMB gives the following:

    Code
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0,8%    0,0%    3,6%   42,3%    0,0%   53,3%
    
    
         r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
      410,50    0,00     51,3M      0,0k     0,00     0,00   0,0%   0,0%   22,40    0,00   9,24   128,0k     0,0k   2,39  98,0% sdb

    There are a lot of small read requests. Request size is 128k whith a pretty high wait time so this gives only 50MB/s and over 400 requests per second while a copy directly on the Server would read at 500r/s with >200MB/s.
    The larger drive manages to stay at around 100MB/s with this load but should still be much faster. That might be the cause when there are too many small requests. It interestingly goes up when i also read from sdb via the same MergerFS at the same time. They are both on the SAS card.


    During a snapraid sync the request size is at 256k with 170MB/s and <3ms wait times. This confirms the theory that maybe the read requests are too small or MergerFS adds way too much delay.



    When reading from both drives via SMB and MergerFS at the same time i notice that the overall throughput drops while %util goes way up showing that wait times increase due to MergerFS drastically.



    Shouldn't be a pcie bottleneck as it is still 8x but i changed the board and slot some time ago so it could be something with that. lspci still shows nice 8x width for the SAS card and there is no slowdown when i dd test the other drive at the same time. I tested different pcie slots with the same results.
    The slowdown and increased %util only happens when both are accessed via SMB ans MergerFS. Two drives via SMB one without MergerFS: no problem. One drive via SMB, other via dd: also no problem.
    A dd write with large blocks makes around 100 w/s while the r/s numbers of almost 600 via SMB feel a bit too high.
    By the way with the currently best config i can get 99% util but with only 50MB/s throughput on the smaller drive with 20ms r_await and on the other drive 100MB/s with 10ms and also 128k rareq-sw.



    The speeds can be observed in iotop too. Writes are fast, reads from the drives are slow, but only via the network and with MergerFS.
    Reading from the ssd without mergerfs i get around 350MB/s reads and writes as expected and cached files transfer with >900MB/s and produce the expected traffic on the network interfaces. So SMB seems to work fine too and should not bottleneck here.


    I tried the usual SMB tuning options. Read/write raw on and off, different transmit sizes and so on.
    Mergerfs was also tested with caches on and off and async_reads and direct_io off and on. With no real observable results. Always remounted/rebooted after these changes.
    The current config of mergerfs is the following with the best performance:


    This gives me more reasonable 120MB/s on the larger HDD but still bad 50MB/s on the smaller one via MergerFS which should not be any slower as i am able to copy from one drive to the other with over 170MB/s continously even via MergerFS directly on the server.


    What caught my eye was the user.mergerfs.fuse_msg_size: 32 option which should be 256 according to the docs but is only used in kernel >=4.20 and debian stretch is currently max 4.19. It would make sense if it has something to do with the small 128k read sizes. Could increasing this lower the overhead?


    I can not see any possible bottlenecks. CPU is pretty much idle, drives are much faster when accessed directly on the server or without mergerfs than the speeds i get when using mergerfs and smb combined. It should be roughly at least double the speed.


    The usual tips don't seem to help here but maybe something is fishy with the chunks read by mergerfs as it behaves kind of like dd with too small block sizes and the hdd with slower access times seems to be affected much more?
    FTP is a little bit faster but only


    With all the tests i made it seems like that when MergerFS is accessed via SMB or FTP it seems to increase the access times a lot and therefore slows everything down.

  • As nobody seemed to have a solution i want to post the reason now:


    Yes it was the size of read requests made by MergerFS.
    I went full risk and upgrading the kernel from testing from 4.19 to 5.4.0-4 fixed all issues as it allows fuse to make requests larger than 128k.


    The read speed increased from <40MB/s via SMB and MergerFS to over 200MB/s and all problems are gone.

  • Same problem here. Good write speeds with bad read speeds. And I also found rareq-sz is around 128 when reading data from smb over mergerfs. But My OS is Ubuntu 21.04 with 5.11.0-49 kernel...

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!