First things first, I apologize for how long this will be.
Currently I have my OMV3 system disk is cheapo 120GB SDD with a single RAID6 with 7x1TB standard desktop disks for storage. I'm currently in the process of purchasing 4TB WD red NAS disks (currently have 3 in hand, was looking to get a few more). I'm trying to decide if I want to go with RAID or with SnapRAID/MergerFS and if I do decide the latter, what my "best practices" would be for my use case.
Some background and usage info.
My network is a mixed environment (Linux/Windows, primarily windows, also have Active Directory etc.) So I have a few Shared Folders for SMB with the bulk of the storage in 2 SMB shares (Movies, TV Shows, Pics, Music, etc are all on one very large share). Another is used for backup purposes of other computers. It's shared out via SMB, and NFS and is also used by Urbackup. I have a few other shared folders (for Docker, PXE Boot/SFTP, Home directories which is SMB shared as well). But the largest share by far is the one shared folder used for Movies/Pics/TV/Music/etc. You can pretty much call it the general do everything shared folder. It contains the largest amount of data all within one shared folder with everything else sorted by sub folders. I'd like to keep it this way for convenience factor.
CouchPotato/SickRage, etc all download directly to a downloads folder within the primary shared folder, postprocess, then once done, moves the files to Movies/TV etc location accordingly.
I'm currently backing up all of the Shared Folders plus some of the critical system folders to CrashPlan (small business edition so it CAN (if configured) continuously back up as files change/are created) so I'm not ENTIRELY worried about the SnapRAID not being real time.
I'm playing around with SnapRAID/MergerFS in a VM and I'm trying to decide on the Create Policy configuration for MergerFS that makes sense for me.
Existing Path....
From the behavior I'm seeing so far, the second I create a shared folder, it puts that folder on a single physical disk. After that, all subsequent file additions created under that folder go to that same disk. Makes sense. I can see pro's/con's for this behavior in my situation though. Considering my one shared folder is going to be so large, most/all of that data will be on that one disk and consuming at least 50-75% of a single disk and continuing to grow considering it's my primary shared storage. I realize this would make moving files/etc much faster as it's staying on the same disk. I would assume (yes I know what happens when you assume) that once that disk hits the configured Min Free Space cut off of the MergerFS, it will dump to the next desk and replicate the folder structure?
Pro's
It's obvious for the most part what physical disk your data is on
Speeds up file move operations (not the biggest deal for me as I rarely move large files)
Spins up/down single disks (again for the most part) for read/writes in a folder
Con's (in my case)
Can fill up a single disk pretty quick while other disks are hardly used
(a maybe, not sure) Possibly makes it more likely to lose more data if a single disk dies(such as one full disk fails when other's are only 10% utilized) maybe?? Obviously depends on parity config, etc of SnapRAID
Anything else that is NOT Existing Path...
Pretty self explanatory to me. Pretty much scatters the data across all the disks determined by which option you choose (random, most space, etc). The directory structure is placed across all of the disks and you don't really know exactly which physical disk specific files are on without hunting it down manually.
Pro's
Utilizes all disks fairly equally
Possibly less likely to lose a lot of data in disk failure as disks are less likely to be completely filled (once again, maybe, not really sure as I've never used SnapRAID)
Cons
File move operations of large files in the same folder could take time as it MAY need to go from disk to disk
Could possibly need to spin up/down multiple disks at a time (not really a huge deal for me with NAS disks)
Files are scattered across disks so it could be difficult to find specific files on physical disk (doesn't seem like a big deal)
Honestly I'm really torn. My main reason for wanting to use SnapRAID and MergerFS is to be able to avoid the whole requirement of having identical disks that is needed for standard RAID (well unless you want your RAID max size being your smallest disk). Obviously it's very dependent on use case scenario and personal choice.
My concerns/questions
- When using an Existing Path create policy, is my concern about filling up a single disk (when other disks are hardly utilized), having it fail and having recovery issues valid?
- If configuring to scatter data instead of using existing path, what would be the best way to backup the data? Backup just the MergerFS mountpoint? Backup the physical disks? Both (although seems like massive overkill)?
- Is recovery from disk failure while using SnapRAID/MergerFS difficult/time consuming compared to conventional RAID? (I still need to run through disk loss scenarios in my VM)
- Is data loss more/less likely from recovery of disk failure from using SnapRAID/MergerFS over conventional RAID?
- Should I split up my data more across several new Shared Folders and SMB/NFS Shares and use an Existing Path create policy? (downloads/Movies/TV on one, more Personal stuff like pics, music etc on another, Homes on another, etc)
Obviously data loss isn't a HUGE deal unless it's massive. I really don't want to have to download TB(s) worth of data from CrashPlan to recover if it's avoidable.
Two of the biggest perks of using SnapRAID/MergerFS over standard RAID is the fact that the data is directly on the physical disk and can be moved to another system in tact which is impossible in a RAID. The other is that you can use multiple disk sizes with SnapRAID/MergerFS and grow without too much difficulty.
I've seen horror stories on both sides but at least it seems a little less likely for complete failure/data loss with SnapRAID/MergerFS unlike RAID where if you lose the RAID, you're pretty much SOL. Unless of course you have a backup, which I would. But once again, the last thing I want is to restore multiple TB's of data from a cloud backup. Granted in most cases I wouldn't bother restoring the numerous video files back to the NAS since it's not exactly necessary.
Anyways, once again I apologize for this being as long as it is. I'm just trying to get my ducks in a row before I go through with a new storage schema. It's quite a pain to make massive changes like this due to all of the different configurations that need to be updated with new locations for everything. I'd like to minimize how often I need to make these changes.