Storage File System Choices based on requirements - Suggestions?

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Storage File System Choices based on requirements - Suggestions?


      Creating a new storage NAS:
      - Dual Xeon X5670 6-core 12 thread (12 core total 24 threads)
      - 24 gigs DDR3-1333 ECC Registered Ram
      - Norco RPC-4216 Case
      - LSI SAS 6G HBA (2 cards)
      - 2 x 10G SFP+ intel NIC (First experience with 10G & SPF+ so will see how that works out)

      OS will be on its own SSD

      All critical information will be backed up our current NAS which has been running for years with no issues. Same specs, but less storage which is driving this question

      I am trying to figure out based on the different disk arrangements, which is the best file system to use in each case.
      - On current NAS I use EXT4 for each mdadm raid volume; I have had great reliability and great performance.

      New system will have a raid 6 volume over 16TB, so not sure EXT4 will be the choice for that portion of the system.

      Attached is an excel file explaining what I know about the storage system so far. Looking for inputs on which file systems to use based on requirements and usage.
      All Raid arrays are created by OMV
      • 2017-11-08 16_56_39-Sever Storage Overview.xlsx - Excel.png

        39.52 kB, 1,136×397, viewed 71 times
    • Most probably not the answer you want to hear :)

      But I would entirely drop the idea of anachronistic RAID, replace 4 of the SSDs with 4 TB HDDs, create one single large zpool out of mirrored vdevs and enjoy amazing fast sequential and random IO performance.

      Create the pool with lz4 compression, create for every share a new ZFS dataset below the main one and for those with 'cold storage' switch from lz4 to zlib (nested ZFS datasets inherit all properties from their 'parents')

      The basics: Mirrored vdevs in one large zpool
    • Just so I understand "tkaiser"

      - You are saying create mirrored Vdevs (2 disks / Vdev) and join all into one large pool ?

      - 14 x 4TB disk and I would create what 7 vdev mirrors and join them into a pool ? Wouldn't this be the same as raid 10, but with 7 groups of raid 1 ?
    • vshaulsk wrote:

      Wouldn't this be the same as raid 10, but with 7 groups of raid 1 ?
      No, it's not the same since I'm talking about a modern attempt to solve an old problem while you're still focussed on RAID :)

      With ZFS you get also transparent file compression, can benefit from snapshots (data protection) and get self-healing even in this setup since ZFS always provides data integrity through checksumming (with classic RAID you get this only with parity modes, so either crappy performance + data integrity or good performance -- RAID10 -- without).

      As already written: 'most probably not the answer you want to hear' ;)

      Even if it takes some time to get familiar with ZFS and some time needed on doing some real tests on real hardware it's IMO worth to consider switching. If you don't have this time now better stay with what you're familiar with since ZFS fundamentals differ in almost all areas compared to 'classic RAID'.

      The post was edited 1 time, last by tkaiser ().

    • tkaiser - sorry, I know that ZFS vdev mirror has advantages over the old raid 10 .... I was just trying to relate it to what I know. To understand performance and disk space loss.

      I have thought about ZFS and even am thinking of using it on the couple of R710 servers running proxmox.(from what I understand I will be able to replicate a VM from one node to the other)

      The main concern I have is the huge amount of disk space I would loose going with your suggestion. Yes I would have better performance on the large storage pool + quicker rebuild + better remaining space utilization due to compression.

      However, all critical data on the large storage pool is backed up so if somehow the system fails... the data won't be lost. It will just take some time to recover and bring it back up.
      Also based on my current raid 6 (8 x 2TB 5900RPM Drives) the performance should be good enough (500+ MB/s read and 230 MB/s write). With how the system is used there is not a lot of random file access during the time when we are transferring large files. The new system, should have even faster read and writes. The disks are faster and also on the read side, I maybe limited by the old SAS 3Gb.

      Now perhaps I should use ZFS mirror vdev pool for the 6 x 500 GB SSDs since these disks will see the highest random IOPS, small file access + location for all databases, NFS Mounts, ISCI Luns. I would say the most critical data would actually be housed on this data pool.

      This still leads me back to my question of what file system to use if I chose to stick with the raid 6 on the large storage pool. Whether I choose to go with 4TB or 6TB disks, I am looking at a array size of close to 30TB to 46TB. With such file system would I use ..... EXT4 (currently using) is limited to 16TB from what I have read.

      Thank you !!
    • vshaulsk wrote:

      EXT4 (currently using) is limited to 16TB from what I have read.
      No, this isn't an issue any more (it was one just few years ago with Debian/OMV since some userspace tools had a 16TB limitation so you were able to create ext4 filesystems larger than 16TB -- no kernel problem -- but userspace tools to deal with your filesystem would've refused to touch it). But seriously: before I would go with RAID6+ext4 I would think twice about RAIDz2.

      If data integrity, snapshots and all the other ZFS benefits aren't worth a thought resync/resilver times alone might justify the switch from anachronistic implementations to up to date attempts.

      Your 46 TB array filled with 23 TB contents with a filesystem agnostic RAID6 will take more than two times longer for every resync or just scrub action (since RAID doesn't know about filesystems, whether it deals with empty space or real data). RAIDZ will only care about the data really used (which is 50% in this case). And why more than two times longer at 50%? Since HDDs all use ZBR and are faster on the outer tracks than the inner. The less they're filled the faster they are (sequential speeds but this matters here too).
    • Iteresting and thank you for the information.

      Raidz2 would perhaps be an option. I would have to get all the disks at once since there is no way to expand the array from what I know.
      Now would 24 gigs of ram be enough? Should I add an SSD for caching if using ZFS?

      If I choose to go the old way of raid6, would EXT4 be the best file system or should I use BRTFs or XFS ?
      I know that BRTFs is a lot newer with more modern features. Just don't know if it is robust enough and whether I would get the same performance. Have read some mixed reviews.
    • vshaulsk wrote:

      Raidz2 would perhaps be an option. I would have to get all the disks at once since there is no way to expand the array from what I know.
      Now would 24 gigs of ram be enough? Should I add an SSD for caching if using ZFS?
      Please have a look at my first link above how RAIDZ (pool) expansion works (and why it's more 'fun' with mirrored vdevs ;) )

      The '1 GB DRAM per TB of storage ZFS requirement' is a common misunderstanding. This rule applies only for zpools with active deduplication (since the the dedup tables should remain in RAM otherwise the machine will swap to death). Beware that while dedup is a ZFS dataset property the dedup tables are shared for the whole pool. So if you're not interested in dedup anyway you don't have to think about RAM shortage with 24 GB and just 46 TB storage. But if dedup is an issue then the shares that could benefit from [1] should be part of an own zpool since then you can limit the memory requirements. This is not an issue with RAIDZ but would be with mirrored vdevs since with the latter you want everything within a single pool since performance improves a lot.

      Anyway: I just wanted to shove you a little bit thinking about using a more modern approach than 'RAID since we did RAID already last decade' :)

      As already said: it takes some time to become familiar with the concepts but it's worth the efforts. I would never go back to anachronistic RAID.

      Wrt other filesystems: ext4 and XFS can be considered the most robust fs on Linux (great read) but it always depends on the feature set you're after (I would not want to miss snapshots and checksumming any more, with large storage pools I want to use a really mature filesystem so again I would choose ZFS and nothing else :) ).

      [1] I have one backup system where whole OS images are stored. With deduplication active I'm able to store there magnitudes more stuff than physically available since due to dedup only differential changes between the whole images need more diskspace when subsequent OS images are cloned. Whether dedup has any benefit depends really on the type of data you use. Usually I avoid it.
    • So I have been thinking about the information above.

      - Created a openmediavault VM and FreeNas VM to learn about ZFS

      - Had no problem creating a virtual environment of either Vdev mirrors or raidz2;
      - Setting the compression and options are pretty straight forward as well.

      One issue which I ran into which I don't see on my OMV VM that I use for testing is the write speeds.

      On my regular OMV VM, it will max out the current gigabit lan network speed when writing files through SMB. This is just one virtual disk (no raid) configured with a BRTFS and EXT4 file system. In either case max out gigabit lan on both read and write. (large files)

      When creating the same one large volume, but using ZFS on either OMV or FreeNas. I am able to get read speeds which max out my lan network, but on write the data transfer is terrible. I loose anywhere from 20% to 50%.

      This behavior makes me concerned that if I ZFS on the production server, that I will loose write performance.

      My current production OMV uses 8 x 2TB (5900 RPM) Seagate Drives in mdadm raid 6 for main storage. Running some tests I believe it can read about 600 MB/s and write around 230/250 MB/s. Which saturates the gigabit lan single point to point connection.
    • vshaulsk wrote:

      raid 6 for main storage. Running some tests I believe it can read about 600 MB/s and write around 230/250 MB/s
      So next logical step would be to test your ZFS also locally, eg with iozone -- 'apt install iozone3' (there we get with above ZFS storage topology depending on count of disks + 800 MB/s in both directions which ends up in ~600 MB/s over NFS with 10 GbE and after the additional vSphere layer in between with another ZFS on top and AFP to macOS clients it's still +350 MB/s)
    • Some updates:

      Put together the new hardware:

      Case: Norco - RPC4220 (20 drives - hot swap)
      CPU: 2 x X5670
      Ram: 48 GB DDR3-1333 ECC
      HBA: 2 x LSI 6Gb SAS
      NIC: 10Gb SFP+ (currently no switch so just testing it with direct connect)

      Hard Drives:
      - 4 x Samsung 850 Pro 256Gb
      - 12 x WD RE 7200 RPM 4TB
      - 2 x 32 SSD (Was going to use for OS ... hoping I can somehow create a mirror OS disk, but if not I will just have one and maybe use clone weekly to second drive)

      I wanted to try ZFS. Thought about switching to FreeNas for this build as ZFS is native, but I really like all the plugins in OMV and I am way more familiar.
      This brings me to implementing ZFS in OMV

      Pool 1 = 2 mirrored Vdevs out of the SSD disks
      Pool 2 = 2 x raidz2 Vdevs - I think this will give me the best space utilization combined with performance

      I have played somewhat with the ZFS plugin in a VM, but I seem to get errors when trying to add more Vdevs to a pool.
      Maybe I should not be using the GUI, but should be using the CLI ?

      What about tuning the system for performance/reliability ? I know that in FreeNas you have autotune which has some benefits.

      Once I get this setup, I will run some benchmarks against my current production system and post the results.
    • Just 3 remarks:
      • making a mirror out of 2 identical SSD is a really bad idea since the reasons why SSD are failing are more or less predictable (firmware errors happening at almost the same time or wearing out at the same time). With such an SSD mirror the chance that both disks fail at almost the same moment is pretty good
      • ZoL vs 'native ZFS'? Huh? In case your CPUs or PCH support Intel's QuickAssist Technology (QAT) to my knowledge situation with ZoL currently is better as on all other platforms (check ZoL 0.7 release notes wrt performance and memory handling)
      • I've to admit that I've no idea about ZFS integration into OMV so far since using other solutions on x64 currently and the ZFS plugin not available for armhf/arm64. So while having a lot of great experiences with ZFS also on Linux I can't talk about OMV integration (yet) :(
    • vshaulsk wrote:

      Thought about switching to FreeNas for this build as ZFS is native, but I really like all the plugins in OMV and I am way more familiar.
      The ZFS integration in FreeNAS has way more possibilites than the ZFS plugin in OMV. With the ZFS plugin only basic tasks are possible (e.g creating a pool, creating ZFS file system, creating a (single) snapshot). More sophisticated tasks like scheduled snapshots are not supported. Other task do not work reliable or with an error message, like adding a vdev, as you have experienced by yourself (there a several threads about this here). Most of them can be done by CLI or/and 3rd party tools.
      OMV 3.0.90 (Gray style)
      ASRock Rack C2550D4I - 16GB ECC - 6x WD RED 3TB (ZFS 2x3 Striped RaidZ1)- Fractal Design Node 304

      The post was edited 1 time, last by cabrio_leo ().

    • There are some other drawbacks of the ZFS plugin, which should be known.
      • The drives already used for ZFS are still shown as available in several selection boxes in OMV. This drives shouldn´t be touched at all.
      • In any case avoid an "Export" of the pool after creating shared folders, CIFS/SMB shares a.s.o., because this breaks the configuration. All creation of shares, USB backup jobs and other things which are related to the ZFS pool must then be done again.
      OMV 3.0.90 (Gray style)
      ASRock Rack C2550D4I - 16GB ECC - 6x WD RED 3TB (ZFS 2x3 Striped RaidZ1)- Fractal Design Node 304
    • Yes I saw the drives still appearing within the plugin as well....this is a major drawback with the plugin.

      What do you mean avoid "export" ? Why would I export the pool ?

      My needs:

      1) Two pools
      - Pool one with the SSD's is for user home folder, business critical shares and files, plex, mysql, databases and some NFS/ISCI targets for Proxmox
      - Pool two is for the large media storage used for the business (4K video and photos)

      Most of the shares are shared through Cifs for the windows workstations.

      I need to be able to do per user/group ACL on the sub folders within a share ..... I think I have figured this out if I turn on the correct attributes within ZFS

      I also create rsnapshot, Rsysnc, usb backups .......
      Run plex, mysql, subsonic, sensors plugins .......beyond this I don't use any other plugins
    • vshaulsk wrote:

      What do you mean avoid "export" ? Why would I export the pool ?
      This is related to the "Export"-button in the plugin. AFAIK the ZFS export is the equivalent to a file system unmount.

      Normally an export is necessary if all ZFS file systems shall be unmounted. Usually it is usefull if a pool shall be transferred from one physical machine to another or between different operating systems.
      OMV 3.0.90 (Gray style)
      ASRock Rack C2550D4I - 16GB ECC - 6x WD RED 3TB (ZFS 2x3 Striped RaidZ1)- Fractal Design Node 304
    • Finally got the system up and running.
      Had to replace all the SAS cables that went from the HBA to sas backplane of the norco case. The ones that came with the HBA were no good.

      I decided to try ZFS plugin for OMV.
      The plugin seems to work alright.
      It gives an error if you try to expand a pool, but when you exit out and look at the file system the expand works.

      Below are some comparisons using BONNIE++ between my production OMV system and this new one.

      I also have done some real transfers using 10 GB SFP+ connections transferring 20 GB file over SMB from the work station to storage servers. This transfer confirmed the results of Bonnie++

      Results + command ran (command was from post).... hopefully it is the right one. I did change the ram entries as the old system has 24GB of ram and new one has 48GB of ram

      Old System:

      bonnie++ -u root -r 1024 -s 28672 -d /srv/dev-disk-by-label-home/test -f -b -n 1 -c 4
      bonnie++ -u root -r 1024 -s 28672 -d /srv/dev-disk-by-id-md-uuid-4af8d8bc-2ac91a1a-349b6049-3e392118/test -f -b -n 1 -c 4

      New System:

      bonnie++ -u root -r 1024 -s 51200 -d /st1/test/test1 -f -b -n 1 -c 4
      bonnie++ -u root -r 1024 -s 51200 -d /st2/test2/test2 -f -b -n 1 -c 4

      Old System:

      2 x 300 GB WD 10K RPM raptor drives in mdadm raid 1 Ext4 file system

      W=105 MB/s RW=48 MB/s R=199 MB/s

      8 x 2 TB Seagate Nas 5900 RPM drives in mdadm raid 6 ext4 file system (advanced sector size and tuned mdadm)

      W= 353 MB/s RW=215 MB/sW=613 MB/s

      New System:

      4 x 256 GB Samsung Pro SSDZFS pool with 2 mirrored vdevs (a shift 12)

      W=634 MB/s RW=340 MB/s R=1463 MB/s

      12 x 4TB WD RE 7200 RPM enterprise drives ZFS pool with 2 raidz2 vdevs (a shift 12)

      W=592 MB/s RW=340MB/s R=1386 MB/s
    • Thank you for the link !!!

      That was an interesting read and I will need to read through it again a couple of times.

      I guess my quick test just showed what I would expect ..... that the new system is much faster vs the old system.

      I am wondering now how my setup of ZFS would compare against the tradition mdadm raid 60 + LVM + Ext4 ....... I might just run a few passive tests just to see what the output is. Just don't know if I want to wait so long for the raid to build .... HAHAHA
      ZFS was so fast to put the pool together !!!
    • vshaulsk wrote:

      ZFS would compare against the tradition mdadm raid 60 + LVM + Ext4
      ext4 is normally faster than zfs because zfs is a CoW filesystem.

      Why use LVM on mdadm raid? OMV doesn't use partitions on raid. So growing an array means you just have to grow the filesystem. No need for LVM.
      omv 4.0.16 arrakis | 64 bit | 4.14 backports kernel | omvextrasorg 4.1.2 plugins source code and issue tracker -

      Please don't PM for support... Too many PMs!
    • Users Online 2

      2 Guests