Fresh install OMV 3, all files are missing from all hdds...

  • Yesterday I did a fresh install of OMV 3 and I setup Snapraid and UFS on my data drives. Previously I was running Greyhole on OMV 2, so all 3 drives had been populated by data. Today I set up Plex and my other media services, and it appears that Plex cannot see any of the media on any of the drives save for 6 tv show episodes. The same is true of the folders that are on Samba shares. When I use the ls command in the CLI, all the file names show up, but the file sizes are listed as <1kb. Am I totally SOL with all my files having been deleted in the setup of Snapraid and UFS, or is there some trick that I am missing that is simply hiding all my files? If I am hosed, what is the best route for recovering the files; I have almost 8 years worth of data that may have been erased off of all 3 redundant drives...


    Thanks

  • I just did some research, and I think your assesment about this being related to Greyhole is correct. The 2 remaining data drives still show the same amount of disc space used as they did last week, and it appears that all the zero size files I found in the folders are symlinks. Tonight I am going to look through the drives using a liveOS and see if I can find the missing files elsewhere on the drives. I will also see what a recovery scan can find. The one area where I may have hurt myself is that my Snapraid parity drive was formerly a Greyhole data drive that I reformatted. I previously had greyhole set to keep 2 copies of all files (across 3 drives), but its possible I made an error somewhere and the only copy of some of the files was on that drive. I will look through things tonight and see if I can find any of the original files instead of just the symlinks.

  • My plan is to explore the contents of the discs from a separate liveusb. I an not planning to boot back into OMV until I have established if any of my files are actually left on the discs. I don't want to risk any automated jobs making file recovery harder.

  • So I just confirmed that all my files are still there. The issue was that greyhole keeps all the real files in a hidden folder named: .greyhole, and only puts symlinks in the top level, visible, folders.


    Now that I know everything is safe I am going to boot into OMV and move everything out of the hidden folder into the main top level folders. The next question I have, is how do I handle the duplicate copies of many of the files? Does Snapraid need duplicate files on the data drives, or does that just waste storage space?

  • No Snapraid doesn't duplicate your data. It creates parity data, which allows recovery while saving space (similar to RAID5 vs RAID1). SnapRAID looks at each drive individually, so it doesn't care about duplicates.


    It's the MergerFS pool that doesn't like duplicates. When reading a file, or changing one, it won't know which instance to use, which might cause problems. There is a project on github called mergerfs-tools which contains a dedup tool for mergerfs which would be useful here. Unfortunately, it isn't installed with the unionfs plugin and it requires compiling and installing dependencies, which is a bit complicated:
    https://github.com/trapexit/mergerfs-tools

    • Offizieller Beitrag

    Can't find the post but I wasn't sure about the status of these utilities. If @trapexit says they are ready, I would think he would add them to his build automation to package them. Then they could easy be included as a dependency of the plugin.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • They are a separate project. I don't plan on including them in the main mergerfs package. They require, usually, only to download and have Python installed. No installation necessary.


    I guess I could make a deb package but it'd be a low priority relative to the main project.

    • Offizieller Beitrag

    I guess I could make a deb package but it'd be a low priority relative to the main project.

    I hadn't looked at what kind of code they are. Since the don't need to be compiled, I can create a package with them included.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • pjoyce42 >> not sure how much data are we talking abut here, but first thing first, and it might help with duplicates, go and get a nice external drive that can fit all your data. I have recently needed to configure my server, which BTW is still in shambles on the bench :) , but since I did not have any usable drives to use for backup I went out and bought a 5TB USB-3 Seagate dataStore and copy my data there.
    #1 I now have an extra copy of the files,
    #2. while I moved the data I deduped it greatly.


    now , if you have more data than can fit on one external, you might want to figure a better strategy.
    one way to do it in your case may be , since you said you have 3 data drives that have been used in redundant setup, designate one drive for backup use. clean it up, format it and begin copying the data to it , deduping as you go. once you have a one full copy of data, format second drive and copy all data to it from the first one.


    now as you have 2 copy , set aside one of the drive format the third one. add both of them(one with data one clean ) to OMV make sure you can clearly identify the drives, I usually when format and partition, add label to FS,
    create snapRaid pool using clean drive as Parity and data as data.
    once all is up and running you can clean the backup drive empty and add it to snapraid.
    and you are ready to build a MergerFS pool.
    of course you can also just get a fourth drive. build out a proper SnapRaid array with 3 disk (1 parity and 2 data)
    copy all data to the properly protected array and than just add the backup disk as 3rd data disk. snapraid is flexible like that. just make sure you do not introduce dups on multiple drives.

    omv 3.0.56 erasmus | 64 bit | 4.7 backport kernel
    SM-SC846(24 bay)| H8DME-2 |2x AMD Opteron Hex Core 2431 @ 2.4Ghz |49GB RAM
    PSU: Silencer 760 Watt ATX Power Supply
    IPMI |3xSAT2-MV8 PCI-X |4 NIC : 2x Realteck + 1 Intel Pro Dual port PCI-e card
    OS on 2×120 SSD in RAID-1 |
    DATA: 3x3T| 4x2T | 2x1T

  • No Snapraid doesn't duplicate your data. It creates parity data, which allows recovery while saving space (similar to RAID5 vs RAID1). SnapRAID looks at each drive individually, so it doesn't care about duplicates.


    It's the MergerFS pool that doesn't like duplicates. When reading a file, or changing one, it won't know which instance to use, which might cause problems. There is a project on github called mergerfs-tools which contains a dedup tool for mergerfs which would be useful here. Unfortunately, it isn't installed with the unionfs plugin and it requires compiling and installing dependencies, which is a bit complicated:
    https://github.com/trapexit/mergerfs-tools

    I did manage to find that late last night, and I was planning to try and install it sometime this week. So far I have not made changes to any files, so mergerfs has not run into any trouble. Its is good to know the Snapraid is not bothered by the duplicates.



    I hadn't looked at what kind of code they are. Since the don't need to be compiled, I can create a package with them included.

    That would be awesome to have as a package available from the control panel. Since it is python, I will probably just install it manually to try and get my files cleaned up sooner. @trapexit is the mergerfs.dedup function stable to use?

    • Offizieller Beitrag

    That would be awesome to have as a package available from the control panel.

    That isn't what I meant. I was just going to create a package (not plugin) to add the python files to the OMV box when the unionfilesystems plugin was installed. You would still need to run them from the command line.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Yes. Everything there is functional. Maybe not as featureful as someone may want but that work.


    Btw... Mergerfs doesn't have issues with duplicates. Humans do. It couldn't care less and does precisely what is dictated by the policies. Which if using the defaults then you may have stale data on drives not selected by the policy but otherwise works as expected. I have duplicates of lots of data and many people place multiple copies of their data across drives (local and remote) for protection. If you're manipulating data then yes... Only one file is changed. I've considered changing that but too many people would likely mistake it for RAID1 like behavior which it wouldn't really wouldn't be.

  • That isn't what I meant. I was just going to create a package (not plugin) to add the python files to the OMV box when the unionfilesystems plugin was installed. You would still need to run them from the command line.

    That's a fair point, and I have no issues with running the tools from the command line.

  • Yes. Everything there is functional. Maybe not as featureful as someone may want but that work.


    Btw... Mergerfs doesn't have issues with duplicates. Humans do. It couldn't care less and does precisely what is dictated by the policies. Which if using the defaults then you may have stale data on drives not selected by the policy but otherwise works as expected. I have duplicates of lots of data and many people place multiple copies of their data across drives (local and remote) for protection. If you're manipulating data then yes... Only one file is changed. I've considered changing that but too many people would likely mistake it for RAID1 like behavior which it wouldn't really wouldn't be.

    Thank you for the clarification. For me, the use of dedup is to try and free up some storage space. Due to the way greyhole works, my files are only partially duplicated across the 2 data drives. So some files are on both drives, while other items are split across the 2 drives at the folder level (i.e. same folder structure on both drives, but the video file is on drive A while the metadata file is on drive B). I agree with you that mergerfs should not care about duplicate files. It does what it says on the box, and that's a good thing.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!