Storage File System Choices based on requirements - Suggestions?

  • This brings me back to my original dilemma of which file system/setup I should use.


    I have good experience with mdadm raid6/ext4 as this is what I have been running for years. No issues, no trouble rebuilding the array, no trouble growing the array..... no issues when I have had to hard reboot the system.


    ZFS is definitely interesting so this is why I am giving it a try. However it is not native to OMV and I won't use some of its features like snapshots.
    I typically use Rsynce and Rsnapshot for backup to other storage (soon to be my old system).


    I am still not sure that ZFS is the right choice as mdadm raid 6 using ext4 or maybe even using the other linux native filesystems btrfs or xfs would not be better ?????



    Other than the typical home and media files this system is also used for my wife's business (wedding, events, etc... photos and video). Now that everyone has moved to 4k the raw files are huge which is why I built the bigger storage array and also for the workstations they will use 10GB lan. 4 people moving up to 100 GB files does take some time over standard gigabit.


    All the business files are backed up to second storage and also depending on the final product size are put on USB drives.



    So again not sure what benefits ZFS would offer me ..... bitrot ? better data protection ? file system repair?

    • Offizieller Beitrag

    Personally, if it is used for business, I would use what you are most comfortable with. If you have had no issues, why move? I still think ext4 is the easiest to repair and recover from problems.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • It might not be advisable to use BTRFS for RAID 5/6. It's not stable in the versions we have in userland.

    The latter is wrong (the bugs live in the kernel code mostly for scrub and recovery) and the former needs an explanation: it's btrfs' own partity RAID implementation that shows recovery bugs therefore being marked as experimental and not ready for production use.


    And some of the btrfs rating ('not fully ready') is based on different assumptions. For example the typical RAID write hole...


    • With anachronistic RAID this is just accepted as integral part of mdraid's implementation (data corruption by design -- I've not seen any user yet trying to address this using mdraid together with a log device -- if you really want to address this problem you would most probably waste 2 devices in a mirror configuration for this to avoid another single point of failure)
    • With btrfs the existence of the parity RAID write hole is not considered 'normal behaviour' but a bug (that's why recent patches added now the ability to use a log device also for btrfs' own RAID5/6)


    So in case of a power loss you get same behaviour with mdraid and btrfs' RAID due to the RAID write hole. But while with mdraid that's regarded as 'sh*t happens, not our problem' the btrfs folks consider the same behaviour a bug and therefore call their implementation 'not ready for productive use' while at the same time hundreds of thousands hobbyists play RAID5 which suffers from the very same problem.

  • I have good experience with mdadm raid6/ext4 as this is what I have been running for years. No issues, no trouble rebuilding the array, no trouble growing the array..... no issues when I have had to hard reboot the system

    Well, this should be considered most basic requirement for any RAID since if it's not working this way RAID is simply not worth the efforts and the wasted ressources. As already said weeks ago at the begin of this thread: choosing another implementation requires to be quite familiar with this which can become a time consuming process...


    The only reason why I now comment on this is since it seems common sense to associate mdraid with 'no issues, no trouble' and people seem to consider the implementation bug free (this pops up especially when people start to mention bugs in btrfs RAID code).


    No, mdraid is not bug free and the same applies to the usual suspects running on top of it (ext4, XFS -- though both can be still considered the most robust filesystems available on Linux). For example this nice mdraid RAID6 data corruption bug which is essentially the same as the btrfs RAID5/6 recovery bugs people complain about lived in the kernel undetected for almost 5 years!


    Such bugs exist everywhere and that's what data protection and data integrity checking is for. And achieving the latter (integrity checking) is way more easy with modern approaches than those from last century.

  • By experience


    if you need more I/O from your storage
    ZFS with ZIL and ZLOG will blow any competition (we speaking about 5times faster)
    don't be a fool, you could use a partition to make this cache you don't have to dedicate a drive for each of them.


    But it's also possible to use cache with LVM which might be more comfortable for you if you have more experience with LVM than ZFS.


    In any case, don't worry if the cache crash, you just loose performance.


    Plz, forget Btrfs, at least for now.

  • if you need more I/O from your storage
    ZFS with ZIL and ZLOG will blow any competition (we speaking about 5times faster)

    Have you read the use case? This is almost 'cold storage' here with large media files that are accessed by a few clients and not in parallel (accesses happen in parallel but to me it looks like not the same data will be accessed in parallel). From a performance point of view this use case would benefit the most from as much system memory available as write cache (limiting ARC) and of course ZIL won't help with performance anyway (it's about data integrity and not increasing performance, most probably you would need to add multiple SLOG devices with very low latency to retain good ZIL performance -- BTW with 'sync=standard' asynchronous writes are skipped entirely so you would need to set 'sync=always' for such workloads to get ZIL protection).


    What has been 'tested' with bonnie++ is something entirely different so I would suggest to test the use case in question first. That's what Helios LanTest is for: Install the tool on at least 2 clients and run the sequential transfer tests with '10 GbE' settings in parallel accessing a share. It's often surprising how crappy these performance numbers mights look while testing storage locally at the server looks nice and fast.


    Helios LanTest generates the test data from scratch so it is not limited by client side storage speed limitations. This is important since it's pretty useless to improve network backbone and server network and storage performance when the stuff that should happen later is bottlenecked by client storage limitations (seen at a former customer of mine, they invested in 10 GbE and external Thunderbolt disk enclosures configured in RAID-0 mode but forgot that the old Mac Pro they were using at every camera station aren't Thunderbolt equipped and so these ultra fast client side storage has been accessed with Firewire-800 efficiently bottlenecking everything to below 60MB/s)


    @vshaulsk: are your client machines running Windows or macOS?

  • The statement; "It might NOT be advisable to use BTRFS for RAID 5/6" is correct, in my opinion, for the reasons you've outlined above

    When you were expressing your opinions you were talking about userland which is still wrong since the problematic code is in the kernel. That's IMPORTANT since this difference means you have to take care of kernel version used and not btrfs-tools package version! That's all my comment was about...


    My above answer was not targeted at @vshaulsk since there exists not a single reason why he could/should choose btrfs at all. It's only about commenting on statements that might mislead future readers (this is a forum -- the best use case for a forum is reading through answers already happened so one has not to ask the same stuff again).

  • Interestingly, kernel 4.14 [...] was just released with a 6 year support window, so there's no knowing exactly when all BTRFS RAID 5/6 issues would be patched

    Copying random BS from somewhere on the Internet into a forum is not a good idea.


    4.14 is not an extended LTS (just the usual 2 years, 4.4 is currently the only recent one with super long term support) and kernel development works as follows: Some kernel versions get LTS support so vendors can choose a stable kernel that is maintained for a longer time and gets mostly security and severe bug fixes while the kernel dev community moves on (currently we're on 4.15 and soon on 4.16 and so on). Not only security fixes get backported from time to time but also such stuff that affects stability or general bugs (especially enterprise Linux distros backport a lot of stuff so the kernel version they use becomes more or less irrelevant).


    And yes, it's still very important to understand this when dealing with btrfs because in OMV's base Debian on x86/x64 there will be newer kernel packages available from time to time through the backports repo and this is (unlike with ZFS) a basic requirement to use modern btrfs features. On the ARM devices kernel support works entirely different and there are people out there running their OMV installations with crappy and outdated 3.x kernels.


    It's important to understand this very difference between those two modern filesystem / volume manager approaches on Linux: with btrfs everything relevant happens in the kernel so you must take care of your kernel version while with ZFS only the ZoL (ZFS on Linux) version is important. Now back to your wording: with ZFS you have to care about the userland package version while with btrfs you must take care about the kernel version.


    And a final note wrt btrfs RAID code 'quality': the btrfs developers consider the RAID write hole a critical bug while the mdraid 'community' simply doesn't give a sh*t. Both RAID implementations without a separate log device suffer from EXACTLY the same problem but while it's marked as 'we don't care, everything fine' with mdraid the btrfs folks honestly label this as a real problem.

    • Offizieller Beitrag

    And a final note wrt btrfs RAID code 'quality'

    This page is an interesting read (I think the source is ok :) ) - https://btrfs.wiki.kernel.org/index.php/Gotchas#Issues

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • well thank you for your comments and reading material.



    I wanted to say a few things about my implementation.


    1) OMV is used for storage of the media files when the team is not working on them.
    - When one of the members is working on a project the file he/she needs is transferred onto the work station as they are equipped for video/photo editing
    - Usually the same files are not accessed at the same time, but the system maybe writing and reading multiple different files at the same time as employees work on different projects.
    - Once finished with the product they send the project back onto OMV


    2) Plex is used for video preview/review for certain clients which is why right/wrong I chose OMV to have decently powerful hardware



    Now I am still debating about ZFS or going back to mdadm raid6 + ext4 ..... currently playing with ZFS, but may go back to OMV native setup as I am worried about ZFS issues as OMV updates. I really need everything to be stable, mostly bullet proof and have great performance.



    Using the 3rd party software for snapshots of ZFS is something I would like to avoid as I would rather use the tools already within OMV. I don't have a lot of time or the knowledge and if things go wrong .... I will see some frustrated people looking at me.



    Comparing the Bonnie++ output vs the Helios lan test. The results of the Helios lan test give much lower results vs what I see when doing network file transfers.


    Currently for testing I am using a windows VM running on the Dell R710 as the work stations are not yet connected over 10Gb.
    - Transferring smaller files (test file 20 GB) results in write speed to OMV 600MB/s to 700 MB/s .... writes to the VM are much slower. I am limited by the disk speed of the VM


    - Transferring large files 78 GB result in write speed to OMV 250 MB/s to 300 MB/s ..... not sure if this again the cause of the VM not being able to push data quickly enough


    However testing with Helios showed write speeds of 150MB/s to 200 MB/s so not even what I was seeing in the real world tests for single client.


    I understand with multiple clients at the same time the real world test would be slower.


    Helios also shows much lower performance over Gibabit vs what I actually see in real life.

  • Comparing the Bonnie++ output vs the Helios lan test.

    Irrelevant since bonnie++ created numbers without meaning. So you're using Windows and obviously Windows 7 or above. Why Explorer numbers differ from LanTest numbers is explained here: https://www.helios.de/web/EN/support/TI/157.html (and also what the settings mean: '10 GbE' is not just larger test files but also larger block sizes which usually results in higher throughput numbers).


    For the workflow you describe (copying forth and back instead of opening files directly on the server and making use of server side locking) the Windows Explorer performance numbers are more appropriate than LanTest (they show 'application performance' when you directly open/load/save data from within applications like Photoshop).


    Due to the way Windows Explorer is acting you can not rely on this tool when it's about further performance optimizations (since it will always show the same high numbers even with bad settings at the server) so this is an area where LanTest again becomes useful even if the numbers itself aren't relevant for later real-world workflow tasks.


    I really don't want you to risk something with ZFS if you're not that familiar with it. But just saying: on servers with that huge amount of data traditional backup attempts become problematic since desaster recovery or a full restore would take ages. That's why we started to implement at customers 'full sync' systems where the whole storage is mirrored to (but not stupid mirroring AKA RAID but due to sending snapshots with versioning included). That's the whole purpose of znapzend: making creation/deletion of snapshots and sending them to remote locations as easy as possible.

    • Offizieller Beitrag

    The developers of BTRFS have marked BTRFS RAID 5/6 as unstable, the problematic code is in the recently released 4.14 kernel and there is no knowing when BTRFS RAID issues will be patched. This is beyond debate.
    ________________________________________


    Copying random BS from somewhere on the Internet into a forum is not a good idea.

    You keep mentioning "the forum", correct information, etc.
    If it's a question of credibility; if you're published or have properly accredited peer reviewed work, point me to it.
    _______________________________________


    Otherwise, whether it's columns on the Internet or posts on this forum, they're all simply opinions. Interspersing a handful of facts, along with loosely correlated data doesn't change that - they're all opinions. A well known fact about opinions - everyone has one.
    In the bottom line, this forum and other similar discourse on the Internet will never be an authoritative source in computer science. Further, even thoroughly vetted sources of information, such as peer reviewed white papers and text books, contain errors.


    Again, this is beyond debate.

  • mdadm raid 6 using ext4 or maybe even using the other linux native filesystems btrfs or xfs would not be better ?????

    @flmaxey: Look, this was the first time in this thread when btrfs was mentioned. No one except you has ever talked about btrfs own RAID implementation since really no OMV user wants to use it (yet). It's really that easy and if you fear others adding information to threads the use of a forum is a bit problematic :)

    • Offizieller Beitrag

    @flmaxey: Look, this was the first time in this thread when btrfs was mentioned. No one except you has ever talked about btrfs own RAID implementation since really no OMV user wants to use it (yet). It's really that easy and if you fear others adding information to threads the use of a forum is a bit problematic :)


    Yep, and I chimed in with exactly two sentences and, with the exception of a single word "userland", what was said was factually correct. Clean, simple, two sentences. Thereafter, the diatribes you're known for, on this forum and others, began.


    Frankly, "TK" I really don't care if you feel the need to lecture or not - there's nothing about it that's problematic for me. In fact, from a Behavior Sciences point of view, it has become something an informal study into a specific personality type that's quite entertaining. On the other hand, while I understand some of the underlying causes, your soliloquies detract from the true purpose of this forum which is to "gently push", prod, or point (mostly) new OMV users in a productive direction. These new users, the bulk of them by the way, need basic help and maybe a command line or two - not a dissertation that goes down the rabbit holes of "opinion".


    On OMV users not wanting to use BTRFS:
    I'm an OMV user and I'm using BTRFS in the only scenario where it's reasonably safe, on a single disk. (my opinion) It allows me, with checksums and scrubs, to see if the disk is beginning to go south and, with a cron job, I get an E-mailed report so I'm not caught off guard. In my opinion, BTRFS is more feature rich than ZFS but I needed something that's usable now. I'm actually looking forward to BTRFS becoming RAID stable but, as it seems, it's going to be a long wait.

  • brendangregg.com/ActiveBenchmarking/bonnie++.html

    maybe I understood wrongly but


    the bonnie++ test compare fedora running with KVM vs Solaris with container
    obviously the I/O will be faster with container.


    I'm not saying Solaris don't kick ass but to compare apple with apple, which in that case ZFS vs other FS you must isolate all others variables such as using the same OS ie: comparing a Fedora with ext4 or xfs with Fedora zfs


    https://www.phoronix.com/scan.…item=zfs_ext4_btrfs&num=5

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!