UnionFS and file distribution

    • OMV 4.x
    • UnionFS and file distribution

      Wondering if there is an automatic ad intelligent way to manage the file load of the drives in a UnionFS array?
      By default the array is set to 'existing path, most free space'. Which generally makes good sense. But I have large drives in the array and am filling it with data from 2 root folders. The behaviour so far seems to be that it's just going to fill up the first drive, then move onto the next drive.

      I'm coming from FlexRaid. It had a setting "Auto Merge with Balanced Space Priority" which balanced the disk space usage. Is there a way to do that with UnionFS?
    • You probably will fill the drive up to the limit specified by in UnionFS. You're probably filling one drive with video content, which are very large files.

      When a shared folder is created on the UnionFS mount point, we'll say Videos, using "Existing Path, Most Free Space", the folder is created on one drive. Thereafter, the first directive "Existing Path" will direct all files to that shared folder only.
      If you use the policy "Most Free Space", "Videos" will be filled with files until one of the other drives has more free space. When that happens, the folder Videos is created on the 2nd drive, is filled with files, and so on. UnionFS adds the contents of the Videos' folders, on two drives. and presents it as one folder.

      "Existing Path, Most Free Space" has some advantages for recovery, but if you're storing a lot of large files in a small number of folders, and want even distribution over all drives, use "Most free space". Based on what you said above about FlexRAID, "Most Free Space" is probably the closest equivalent.

      BTW: You can change policies on the fly, but the array won't "auto rebalance" existing storage. However, over time, it will balance out.

      The post was edited 1 time, last by crashtest: edit ().

    • Could you elaborate on the advantages of "Existing Path, Most Free Space" in regards to recovery?

      If I have a large media folder with numerous sub-folders, is the "Existing Path, Most Free Space" smart enough to not split the sub folders among drives?

      Also, what is the difference between "Existing Path, Most Free Space" and ""Existing Path, Least Used Space"
    • crashtest wrote:

      You probably will fill the drive up to the limit specified by in UnionFS. You're probably filling one drive with video content, which are very large files.

      When a shared folder is created on the UnionFS mount point, we'll say Videos, using "Existing Path, Most Free Space", the folder is created on one drive. Thereafter, the first directive "Existing Path" will direct all files to that shared folder only.
      If you use the policy "Most Free Space", "Videos" will be filled with files until one of the other drives has more free space. When that happens, the folder Videos is created on the 2nd drive, is filled with files, and so on. UnionFS adds the contents of the Videos' folders, on two drives. and presents it as one folder.

      "Existing Path, Most Free Space" has some advantages for recovery, but if you're storing a lot of large files in a small number of folders, and want even distribution over all drives, use "Most free space". Based on what you said above about FlexRAID, "Most Free Space" is probably the closest equivalent.

      BTW: You can change policies on the fly, but the array won't "auto rebalance" existing storage. However, over time, it will balance out.
      I think I understood what you wrote, but what happens if I use Existing Path, Most Free Space policy, and the folder on first drive is filled full? I think UnionFS should create another folder with same name and path on other drive and continue write to there, but not this happens, it says there is nore more space on disk (however the other disk in pool is almost empty)
    • schulcz wrote:

      I think I understood what you wrote, but what happens if I use Existing Path, Most Free Space policy, and the folder on first drive is filled full? I think UnionFS should create another folder with same name and path on other drive and continue write to there, but not this happens, it says there is nore more space on disk (however the other disk in pool is almost empty)
      The part you are missing is "Existing Path." If the path does not already exist on a drive, mergerfs will not create it and won't be able to write data there.
      --
      Google is your friend and Bob's your uncle!

      OMV 4.x - ASRock Rack C2550D4I - 16GB ECC - Silverstone DS380
    • gderf wrote:

      schulcz wrote:

      I think I understood what you wrote, but what happens if I use Existing Path, Most Free Space policy, and the folder on first drive is filled full? I think UnionFS should create another folder with same name and path on other drive and continue write to there, but not this happens, it says there is nore more space on disk (however the other disk in pool is almost empty)
      The part you are missing is "Existing Path." If the path does not already exist on a drive, mergerfs will not create it and won't be able to write data there.
      So I always should manually create aprropriate path and folders on more drives to make it write to there, if necessary?
    • sonofwatt wrote:

      Could you elaborate on the advantages of "Existing Path, Most Free Space" in regards to recovery?

      If I have a large media folder with numerous sub-folders, is the "Existing Path, Most Free Space" smart enough to not split the sub folders among drives?

      Also, what is the difference between "Existing Path, Most Free Space" and ""Existing Path, Least Used Space"
      "Existing Path, Most Free Space" has a cleaner, more understandable, root directory. (And if using this policy and creating a mergerfs pool from scratch, it makes sense to set up the largest shared folder first and copy data to it. Then, move on, adding shares, large to small.)

      For instance, if you create "Videos", and copy content to it, it will exist on one drive.
      We'll say the next shared folder created is "Documents". Since there is no "Existing Path" for documents, it will go to another drive with "More Free Space", where files are stored on the second drive. And so on.

      I'm using the "Existing Path" policy which resulted in a distribution similar to the following:

      Drive1:
      /Videos (and all of it's sub-dir's)

      Drive2:
      /Backups
      /Documents
      /Users
      /Share
      /Music
      /ISOs-Images
      _________________________________________________

      In the bottom line, the contents of the above are discrete. I could move or copy one or more of the root folders above, to another drive in the pool, or out of the pool altogether. Also, I could delete the drive pool and simply point a shared folder, to an existing drive directory. Easy - there's no need to reconsolidate data into a single root folder as it would exist outside of a pool. It's already done.

      But note that the Video's folder on Drive1, in this instance, is roughly the same size as ALL the root folders on Drive2, combined. If the vast majority of a data store consists of huge files and the collection is expected to grow, "Existing Path, Most Free Space" doesn't work well. In this example case, if Video files are continually added, Drive1 would be filled to the capacity specified by the plugin's drive limit. (I set the limit to 10% of the capacity of the largest drive, for a bit of advanced warning.)
      __________________________________________________

      If you're looking for even file distribution among all the drives in the pool, that is similar to what a user would get with RAID5, "Most Free Space" is the policy to use. The down side of this is, folders, and sub-dir's are distributed in such a way to where reconsolidation can be difficult. If you decide to use "Most Free Space", in addition to full backup, I'd protect the pool with an extra drive using SNAPRAID.
      __________________________________________________

      ""Existing Path, Least Used Space":
      I've never used this policy, but it seems it would be the rough equivalent of "Existing Path, Most Free Space". It might have something to do with mixing drives of differing capacities but that's just idle speculation.

      There are other policies that I wouldn't use for a home NAS but the Dev, @trapexit , may have developed options for use cases I'm unaware of. If you really want to dig into it, here's the Dev's -> github page.
    • schulcz wrote:

      gderf wrote:

      schulcz wrote:

      I think I understood what you wrote, but what happens if I use Existing Path, Most Free Space policy, and the folder on first drive is filled full? I think UnionFS should create another folder with same name and path on other drive and continue write to there, but not this happens, it says there is nore more space on disk (however the other disk in pool is almost empty)
      The part you are missing is "Existing Path." If the path does not already exist on a drive, mergerfs will not create it and won't be able to write data there.
      So I always should manually create aprropriate path and folders on more drives to make it write to there, if necessary?
      Either create the path beforehand or use a create policy with mergerfs that does not use existing path.
      --
      Google is your friend and Bob's your uncle!

      OMV 4.x - ASRock Rack C2550D4I - 16GB ECC - Silverstone DS380
    • crashtest wrote:

      But note that the Video's folder on Drive1, in this instance, is roughly the same size as ALL the root folders on Drive2, combined. If the vast majority of a data store consists of huge files and the collection is expected to grow, "Existing Path, Most Free Space" doesn't work well. In this example case, if Video files are continually added, Drive1 would be filled to the capacity specified by the plugin's drive limit. (I set the limit to 10% of the capacity of the largest drive, for a bit of advanced warning.)
      That's similar to my setup. 2 folders have the same amount of data as about 12 other folders. The first 2 have large files, the other 12 have a mix of sizes.

      So then unionfs doesn't just create another "/Videos" for you on another drive with free space when the current one fills up?
      What if you went ahead and manually created another Videos folder on another drive in the array? Would "Existing Path, Most Free Space" see that other folder on the other drive and continue filling files there?

      I currently have only 2 large drives in the untionfs array, but do plan on getting a 3rd of the same size to create a SnapRaid raid5 array.
    • sonofwatt wrote:

      So then unionfs doesn't just create another "/Videos" for you on another drive with free space when the current one fills up?
      Using "Existing Path, most free space":
      The drive with the Existing Path "/Videos" will fill to the drive limit specified in the plugin. From there, when there's no free space left, I'm guessing it "might" create a /Videos folder on a second drive but I haven't tested it.

      sonofwatt wrote:

      What if you went ahead and manually created another Videos folder on another drive in the array? Would "Existing Path, Most Free Space" see that other folder on the other drive and continue filling files there?
      I'm guessing it would (but again I haven't tested it). With two "Existing paths", the next directive is "most free space". I think I can see where you're going with that idea - rather than scrap "Existing Path", which keeps root folders sorted and clean, manually add a root folder on another drive with more space.

      Try it. If you have an existing drive imbalance, after the second /Videos folder is created, the next video file copied should be stored in the second folder where the drive has more space. (Don't forget to check the permissions on the 1rst Video folder and replicate them in the second.)

      sonofwatt wrote:

      I currently have only 2 large drives in the untionfs array, but do plan on getting a 3rd of the same size to create a SnapRaid raid5 array.
      That's an excellent idea. I've been testing SNAPRAID for awhile now. I intentionally set it up on aging drives and found that it does, indeed, provide bit-rot protection.

      The post was edited 1 time, last by crashtest: edit ().

    • crashtest wrote:

      I'm guessing it would (but again I haven't tested it). With two "Existing paths", the next directive is "most free space". I think I can see where you're going with that idea - rather than scrap "Existing Path", which keeps root folders sorted and clean, manually add a root folder on another drive with more space.

      Try it. If you have an existing drive imbalance, after the second /Videos folder is created, the next video file copied should be stored in the second folder where the drive has more space. (Don't forget to check the permissions on the 1rst Video folder and replicate them in the second.)
      It will be a while until the first drive is filled, but I'll try to remember to report back.

      crashtest wrote:

      I've been testing SNAPRAID for awhile now. I intentionally set it up on aging drives and found that it does, indeed, provide bit-rot protection.
      That's good to hear. The initial reason for looking into OMV was the bit-rot protection with SNAPRAID.
    • I appreciated your posts. You've looked at this from an angle I hadn't thought about. The "Most Free Space" policy, which scatters storage folders over all hard drives, is a real commitment to the UnionFS plugin. The "Existing Path, Most Free Space" policy might still work well, even with out-sized Video file storage, with the manual creation of matching Video folders on all member drives. It's certainly worth testing in a VM.
      _______________________________________

      The Bit-rot protection feature was part of what generated my interest in SNAPRAID as well. All implementations of effective bit-rot protection, that I know of, require 2 times the drive real-estate to set it up. (That's CoW file systems, in the rough equivalents of traditional RAID1.) With a SNAPRAID parity drive as a 4th drive, protecting 3 data disks in a drive pool, the bit-rot protection requirement drops from 100% to 25%. That's a considerable difference. With the two disks saved + one more, there's enough storage for a full data store backup.
    • Thanks for following up. This has been really helpful.
      Yeah I hope that "Existing Path, Most Free Space" works as you've said. Worst case, in an SnapRAID array with "Most Free Space" I should be covered.
      ____________

      Yeah I was running a FlexRAID array with around 12 x 1TB drives. Bit-rot was an ongoing problem. With backups it wasn't a fatal issue, but it was an ongoing annoyance and lots of work to maintain. I plan on eventually building this array one up to RAID 6. With SnapRAID being able to add disks with data on them, and the ability to expand the array made it the only choice for me.