SSD Cache (bcache) for OpenMediaVault?

  • I wonder what exact enterprise-grade stuff will I need to get such performance lol...

    Well, the following is made with toy-grade hardware (a dual core ARM SoC with a consumer grade M.2 SSD connected to one of the SoC's native SATA ports and another 2.5" consumer grade SSD attached to a Marvell 88SE9215 SATA controller). These are results in KB/s from an 'iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2' benchmark:

    Code
    random    random
                  kB  reclen    write  rewrite    read    reread    read     write
              102400       4    65688    80881   110730   118944    37860    77260
              102400      16   161094   185891   241206   257926   104111   167703
              102400     512   281322   280213   324536   364836   348267   289555
              102400    1024   285398   293523   552984   569205   542326   299000
              102400   16384   264123   286763   744096   761679   743206   312469

    This is a RAID10 made out of 2 devices to combine the benefits of mirroring with the performance of striping (mdadm --create /dev/md16 --level=10 --metadata=0.90 --raid-devices=2 --layout=f2 /dev/sda1 /dev/sdb1).


    Up to 750 MB/s sequential reads and close to 300 MB/s write performance. Reads were bottlenecked by one SSD being behind the Marvell controller (the 'somewhat below 400 MB/s' throughput limit here is responsible for the 750 MB/s), write performance bottlenecked by the SSDs themselves (cheap consumer crap). And the random IO numbers could be way better with a smaller RAID chunk size since I used defaults:

    Code
    root@clearfogpro:/mnt/md16# cat /proc/mdstat 
    Personalities : [raid10] 
    md16 : active raid10 sdb1[1] sda1[0]
          117210112 blocks 512K chunks 2 far-copies [2/2] [UU]
          [=========>...........]  resync = 49.5% (58067200/117210112) finish=4.7min speed=205614K/sec
          bitmap: 1/1 pages [4KB], 65536KB chunk

    BTW: while the resync was running I measured too and got still +100/+500 MB/s write/read and almost 75% of the random IO numbers. With any x64 box, two SATA 3.0 ports that have not to share bandwidth and with 2 SSDs that are known to exceed the SATA 3.0 throughput limitation (+550 MB/s) we would talk about +1000MB/s in both read and write direction and with smaller RAID chunk size also about impressive random IO numbers.


    That's what I meant above: storage separation. Put all the stuff that needs high performance on devices with an appropriate topology (eg. 2 Samsung Pro 850 SSDs in such a RAID10 topology, or 3 still with RAID10 and far layout). Or when it's about huge amounts of data simply throw in a few more disks and use an appropriate storage topology: Mirrored vdevs in one large zpool.


    Somewhat decent x64 boxes have plenty of IO bandwidth and if you use a bleeding edge ZoL version you don't need that much DRAM any more and can use also somewhat weak but energy efficient CPUs as long as they support QAT, Intel's QuickAssist Technology -- see the performance section here: https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.0



    Boards like Gigabyte's MA10-ST0 or SuperMicro's A2SDI-H-TP4F-O rely on the 16-core C3958 Denverton with QAT support built into the PCH, allow for a sufficient amount of ECC memory, provide a couple of 10GbE ports and 12 to 16 SATA ports.


  • Thanks! In short, for the purpose of this thread, do I just need to "worry" about how fast the disks are and the interfaces they are using?

  • In short, for the purpose of this thread, do I just need to "worry" about how fast the disks are and the interfaces they are using?

    Don't think so. But as already said it depends on the use case. If you for whatever reasons switched to 10GbE and want to see now 'nice benchmark numbers' to recoup your investment that's something you most probably can cope best with by adjusting your expectations ;)


    If you want to max out '10GbE NAS performance' it could become somewhat time consuming since you need to max out network first: use iperf3 as first try and check with htop in parallel if you're bottlenecked by single-threaded behaviour maxing out one CPU core -- if that happens more recent kernels and compiler versions iperf3 has been built with can help. But that's just synthetic benchmark numbers and you should always look at IRQ affinity to: /proc/interrupts -- depending on kernel version fixed IRQ affinity might be a way better idea to get max performance compared to IRQ balancing.


    Then it's about storage performance. If you've a somewhat decent x64 box you shouldn't be worried as long as you choose an appropriate disk topology. I won't repeat the 'mirrored vdevs as one pool' thingie again (maybe one more time next ;) ) but with today's storage costs it's really worth an idea to move away from traditional RAID/storage attempts (n data disks + single or double parity).


    If you have working backups then there's nothing wrong with using RAID-0 as long as you put a checksummed fs on top (then periodic scrubs will tell you when something went wrong and you need to restore data from the backup). Important: working backups are defined by testing restores repetitively (of course no one is doing this at home). With such a topology and a few 3.5" HDD you could get NAS throughput numbers of 500 MB/s or more with any somewhat decent x64 box as long as you did not totally f*ck up the setup (eg. putting 8 SATA ports behind a single PCIe 2.x lane). But why? What's the purpose?


    If you copy a 10GB archive in 20 seconds or in 60... does it really matter? As soon as you're about to move a bunch of smaller files over a network connection 'bandwidth' is trashed anyway (way lower -- and if you try to improve these numbers it gets really interesting with traditional approaches like RAID6/RAIDZ2 but realistically you can only improve write performance with caching attempts like bcache or the alternatives)


  • Thanks!


    1. I switched to 10Gbe when I heard that I can get better speeds and overcome the Gigabit limit, which I was facing then. I just thought that I can get those "nice benchmark numbers" when I saw others doing so. Apparently I didn't realise until much later that they are either using SSDs or at least a RAID setup!


    2. I checked using iperf3 and found that I can get about 9.5 Gbit/s. I am not sure about IRQ affinity though, gotta read up on that


    3. Good idea to consider newer storage topology. Just that how are they portable? Say for example, the NAS goes down. How do I transplant the HDDs to a new system with as little data loss as possible?


    4. I check on my backup NAS from time to time, is this good enough?


    5. You are right, there are so many ways a NAS can be seen as fast or slow. I guess I just need to know when to just settle down and use them for their intended purpose!


    6. Should I be considering more 4TB HDDs or less 8TB HDDs in a ZFS setup? I am trying to plan such that I can have just 8 HDDs per machine due to chassis (or is this even the right way to think about this in the first place?)

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!