RAID or SnapRAID w/mergerFS

    • OMV 3.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • RAID or SnapRAID w/mergerFS

      First things first, I apologize for how long this will be.

      Currently I have my OMV3 system disk is cheapo 120GB SDD with a single RAID6 with 7x1TB standard desktop disks for storage. I'm currently in the process of purchasing 4TB WD red NAS disks (currently have 3 in hand, was looking to get a few more). I'm trying to decide if I want to go with RAID or with SnapRAID/MergerFS and if I do decide the latter, what my "best practices" would be for my use case.

      Some background and usage info.

      My network is a mixed environment (Linux/Windows, primarily windows, also have Active Directory etc.) So I have a few Shared Folders for SMB with the bulk of the storage in 2 SMB shares (Movies, TV Shows, Pics, Music, etc are all on one very large share). Another is used for backup purposes of other computers. It's shared out via SMB, and NFS and is also used by Urbackup. I have a few other shared folders (for Docker, PXE Boot/SFTP, Home directories which is SMB shared as well). But the largest share by far is the one shared folder used for Movies/Pics/TV/Music/etc. You can pretty much call it the general do everything shared folder. It contains the largest amount of data all within one shared folder with everything else sorted by sub folders. I'd like to keep it this way for convenience factor.

      CouchPotato/SickRage, etc all download directly to a downloads folder within the primary shared folder, postprocess, then once done, moves the files to Movies/TV etc location accordingly.

      I'm currently backing up all of the Shared Folders plus some of the critical system folders to CrashPlan (small business edition so it CAN (if configured) continuously back up as files change/are created) so I'm not ENTIRELY worried about the SnapRAID not being real time.

      I'm playing around with SnapRAID/MergerFS in a VM and I'm trying to decide on the Create Policy configuration for MergerFS that makes sense for me.

      Existing Path....
      From the behavior I'm seeing so far, the second I create a shared folder, it puts that folder on a single physical disk. After that, all subsequent file additions created under that folder go to that same disk. Makes sense. I can see pro's/con's for this behavior in my situation though. Considering my one shared folder is going to be so large, most/all of that data will be on that one disk and consuming at least 50-75% of a single disk and continuing to grow considering it's my primary shared storage. I realize this would make moving files/etc much faster as it's staying on the same disk. I would assume (yes I know what happens when you assume) that once that disk hits the configured Min Free Space cut off of the MergerFS, it will dump to the next desk and replicate the folder structure?

      Pro's
      It's obvious for the most part what physical disk your data is on
      Speeds up file move operations (not the biggest deal for me as I rarely move large files)
      Spins up/down single disks (again for the most part) for read/writes in a folder

      Con's (in my case)
      Can fill up a single disk pretty quick while other disks are hardly used
      (a maybe, not sure) Possibly makes it more likely to lose more data if a single disk dies(such as one full disk fails when other's are only 10% utilized) maybe?? Obviously depends on parity config, etc of SnapRAID


      Anything else that is NOT Existing Path...
      Pretty self explanatory to me. Pretty much scatters the data across all the disks determined by which option you choose (random, most space, etc). The directory structure is placed across all of the disks and you don't really know exactly which physical disk specific files are on without hunting it down manually.

      Pro's
      Utilizes all disks fairly equally
      Possibly less likely to lose a lot of data in disk failure as disks are less likely to be completely filled (once again, maybe, not really sure as I've never used SnapRAID)

      Cons
      File move operations of large files in the same folder could take time as it MAY need to go from disk to disk
      Could possibly need to spin up/down multiple disks at a time (not really a huge deal for me with NAS disks)
      Files are scattered across disks so it could be difficult to find specific files on physical disk (doesn't seem like a big deal)


      Honestly I'm really torn. My main reason for wanting to use SnapRAID and MergerFS is to be able to avoid the whole requirement of having identical disks that is needed for standard RAID (well unless you want your RAID max size being your smallest disk). Obviously it's very dependent on use case scenario and personal choice.

      My concerns/questions
      1. When using an Existing Path create policy, is my concern about filling up a single disk (when other disks are hardly utilized), having it fail and having recovery issues valid?
      2. If configuring to scatter data instead of using existing path, what would be the best way to backup the data? Backup just the MergerFS mountpoint? Backup the physical disks? Both (although seems like massive overkill)?
      3. Is recovery from disk failure while using SnapRAID/MergerFS difficult/time consuming compared to conventional RAID? (I still need to run through disk loss scenarios in my VM)
      4. Is data loss more/less likely from recovery of disk failure from using SnapRAID/MergerFS over conventional RAID?
      5. Should I split up my data more across several new Shared Folders and SMB/NFS Shares and use an Existing Path create policy? (downloads/Movies/TV on one, more Personal stuff like pics, music etc on another, Homes on another, etc)
      Obviously data loss isn't a HUGE deal unless it's massive. I really don't want to have to download TB(s) worth of data from CrashPlan to recover if it's avoidable.

      Two of the biggest perks of using SnapRAID/MergerFS over standard RAID is the fact that the data is directly on the physical disk and can be moved to another system in tact which is impossible in a RAID. The other is that you can use multiple disk sizes with SnapRAID/MergerFS and grow without too much difficulty.

      I've seen horror stories on both sides but at least it seems a little less likely for complete failure/data loss with SnapRAID/MergerFS unlike RAID where if you lose the RAID, you're pretty much SOL. Unless of course you have a backup, which I would. But once again, the last thing I want is to restore multiple TB's of data from a cloud backup. Granted in most cases I wouldn't bother restoring the numerous video files back to the NAS since it's not exactly necessary.

      Anyways, once again I apologize for this being as long as it is. I'm just trying to get my ducks in a row before I go through with a new storage schema. It's quite a pain to make massive changes like this due to all of the different configurations that need to be updated with new locations for everything. I'd like to minimize how often I need to make these changes.

      The post was edited 1 time, last by ParadingLunatic ().

    • After doing some exhaustive research, and lots of testing in a VM, I decided to go the route of SnapRAID and MergerFS. Here's my take about the experience according to my concerns above.

      1. It seems it depends on how your data is setup. I personally don't mind the data being scattered all over across the drives so I didn't bother with any of the Existing Path options. Technically this sorta bit me later with Docker which I'll explain later. I'm not positive but (when using SnapRAID) I don't believe filling a disk, having it fail and recovering is any more of an issue then what would normally happen with a failure. Regardless, always have a good backup.
      2. I'm just backing up my data via the mergerfs. The way I see it, if I had data loss which is unrecoverable via SnapRAID, restoring the data from backup to the mergerfs would restructure and spread according to the create policy and SnapRAID would just need a resync to update what's where and the parity.
      3. From what I gather recovery from either isn't exactly a difficult process. It seems though in OMV something needs to be done at the CLI level (had to replace a few drives in my old array and recall there was a command/set of commands to get the RAID corrected)
      4. Honestly I think the likeliness of data loss in the event of a failure doesn't change much between the two, just the amount of data loss changes. From what I gather, recovery time could vary widely depending on the severity of the loss. The nice thing is that with a SnapRAID failure, you only lose some of the data (or entire disk) where as with a RAID failure, chances are your entire array is usually lost. Because of this, there's less time required from a backup restore.
      5. This varies widely I would imagine. In the long run I created more Shared Folders to organize a little better and not using an Existing Path create policy. I figured I might as well distribute the load across disks. I'm not viewing this for performance purposes but more along the lines of distributing the wear and tear.


      As for a few lessons learned

      • Not really a lesson learned but should be REALLY obvious, take a backup first.
      • If you're running Docker, you might want to consider not using it from a MergerFS share. It looks like Docker and MergerFS don't play nicely unless you remove the direct_io option which from what I've read, could greatly reduce performance depending on the kernel, etc. I also had problems running Docker containers from my storage disks because they were mounted with noexec, even though the fstab entries is supposed to fix this issue.
      • Moving from one storage type to another is a major headache simply due to changes in mount points (in my case from /media/uuid to /srv/mergerfsuuid). Changing all the addins, and configurations in plugins, and scripts and numerous other things took LOTS of time to deal with, which I'm almost positive I didn't catch them all. I PROBABLY should have created symlinks to the old locations but didn't want to leave a web of confusion for myself later.
      • Removing Shared Folders (and eventually File Systems/RAID arrays, etc) is a headache. I really wish there was a better way to list what is referenced where. Sure the Shared Folder says it's referenced, but what is referencing it. I already knew about the sneaky Home share and oddly enough it didn't trip me up.
      • If you've created NFS shares, even after deleting them (at least for me), they were still not removed from the config.xml file (and in turn, still existed in fstab). The only reason I noticed this was when configuring Emby it listed a bunch of old NFS shares being available that I had long since removed.
      I'm sure there's plenty I'm forgetting but...eh.
    • ParadingLunatic wrote:

      Moving from one storage type to another is a major headache simply due to changes in mount points (in my case from /media/uuid to /srv/mergerfsuuid). Changing all the addins, and configurations in plugins, and scripts and numerous other things took LOTS of time to deal with, which I'm almost positive I didn't catch them all. I PROBABLY should have created symlinks to the old locations but didn't want to leave a web of confusion for myself later.
      This only affect plugins that don't use shared folders. And OMV 4.x bind mounts all shared folders to /sharedfolders/NAME. So, you shouldn't have the issue of needing to change locations in non-shared folder using plugins/scripts. The /media to /srv change is the first change ever in OMV. Not too many worries there.
      omv 4.1.9 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.10
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • ryecoaaron wrote:

      ParadingLunatic wrote:

      Moving from one storage type to another is a major headache simply due to changes in mount points (in my case from /media/uuid to /srv/mergerfsuuid). Changing all the addins, and configurations in plugins, and scripts and numerous other things took LOTS of time to deal with, which I'm almost positive I didn't catch them all. I PROBABLY should have created symlinks to the old locations but didn't want to leave a web of confusion for myself later.
      This only affect plugins that don't use shared folders. And OMV 4.x bind mounts all shared folders to /sharedfolders/NAME. So, you shouldn't have the issue of needing to change locations in non-shared folder using plugins/scripts. The /media to /srv change is the first change ever in OMV. Not too many worries there.
      Yeah I was already planning for the /media /srv change so I knew that was coming. I also knew that it would only effect non-plugin settings, which unfortunately is still quite a bit depending on what you're using. With Deluge, Couchpotato, Sabnzbd, Sickbeard(rage, whatever your poison may be), etc, there's still all of the settings and such you need to change in there...and in numerous places. Then there's the post processing scripts to update, etc etc. Thankfully I went through just about everything prior to starting the process and documented everything that pointed to /media/uuid.

      As for OMV 4.x and the shared folder binding. Now THAT is a nice feature being added. Should definitely make things a little more intuitive.
    • gderf wrote:

      What exactly does that do and what is its scope?
      It mounts newly created filesystems at /media instead of /srv. It doesn't change the mount back to UUID though. So, you will get /media/dev-disk-by-label-files instead of /srv/dev-disk-by-label-files or like pre-omv 3.0.71 mount points like /media/a2705f42-2365-43b7-aeab-39dde955c3ca. The /sharedfolder bind mounts in OMV 4.x should fix the issue and provide a consistent mount point for services/scripts.
      omv 4.1.9 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.10
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • I cannot approve more of your choice of going down the route of snapraid/mergerfs.

      I've run a raid array for years, and lost it a few times (before I decided to give up on it).

      It was never really because of hardware failure (the only tine was a faulty sata cable), but always because I was fiddling, panicked, and did mistakes. (Once I found a script on internet and erased all my superblocks in one go). I even almost lost a raid0 array!

      Then I figured raid is only needed if one needs availability of the missing data. Else, it is basically creating way too many problems and constraints for a reasonable home usage.

      I replaced everything by a SSD and rsnapshot for critical data, and mergerfs+snapraid for all the rest, and stopped to worry.

      I'm running a 3 ssd (1tb, 0.5tb, 250gb), 5*2tb HDD, and one 4tb HDD for parity and rsnapshot storage, and I'm currently migrating it to omv.