Shared folder missing after update

  • Hardware:

    - CM4 8GB

    -Radxa TACO mainboard

    - 5x mixed HDD/SSD connected to mainboard


    Setup:

    - OMV7 (installed using RPi install script when it was OMV6, did in-place upgrade to OMV7)

    - SDcard as boot drive

    - MD raid6 with 4x (mixed SSD/HDD - sorry!) drives - EXT4 4TB total

    - Multiple shared folders in the base MD raid folder

    - SMB shares for network access

    - rsync backup to 5th drive, which has shared folders setup similar to main RAID

    - compose plugin installed, which runs portainer, which runs nextcloud

    - remote mount installed, which I use to backup a local HA instance.

    - OMV backup plugin installed which does DD of SD card to shared folder occasionally.

    - NodeRED installed (using their standard install script) - this monitors OMV in background and sends stats to HA

    I *think* that is everything of importance. Happy to hear also about the poor choices I made for my system setup - but things evolve over time and this is what I have now :P . Also, it's been awesome so far.


    What happened:

    System has been rock solid for a number of years.

    Recently did an update through OMV interface (probably well overdue, since the server just sits there happily servering).

    OMV web interface wouldn't come back after the update.

    Did lots of googling and ran a few commands to restart nginx and PHP (previously used these after upgrade from OMV6 to 7 stopped the web interface from coming up) - and everything came back up.

    Unfortunately one of the shared folders is now completely missing. I noticed that one of my SMB shares was showing an error in Windows. It was working perfectly just before the update as it was using it 10 minutes earlier.

    The file system is still there, and all the other shard folders work, but one shared folder is gone.

    In OMV web interface in 'shared folders' it shows up without any warnings, and in the SMB shares tab it is still there, but when I try to edit the shared folder (e.g. open ACL's) it says the folder doesn't exist 'No such file or directory'.

    Connecting through winscp verifies that this shared folder is simply gone from the system - used to be in the main folder of the RAID array along with all the other shared folders.


    I have backups and they are fine - so this isn't about getting my data back, but I really would like to know what happened here.

    I haven't done any recovery yet until I can establish what happened and where the data is - unfortunately i have no idea where to start looking. Any help would be appreciated!


    I've used OMV for many, many years and do intent to continue, but a sudden folder disappearance like this has me spooked.


    Thanks in advance.

  • OK, so its clear the update had nothing to do with this. Looks like a RAID array that has not-very-gracefully failed. I've checked the log files and found the following after the update/reboot:

    In summary, it has deleted one shared folder completely and around half the data from my other folders... not bad for a quick reboot!


    This brings up quite a few questions if anyone can help answer?


    Does this mean that fsck has kicked in and just deleted half my raid array, or is this an MD related recovery mode?

    Should the raid array not be marked as degraded immediately on the reboot rather than deleting everything?

    Can anything be done in mdadm to attempt a repair?


    I've done some research and it seems like raid6check could help me repair the data (note as original post above that this isn't about recovery in itself, since I have backups, but its nice to know it's possible). From my research though, debian no longer supports raid6check. Does anyone know if this is the case, or if I'm misunderstanding? If so it'd be useful to have a warning somewhere when setting up raid6 that the mdadm raid6check tool isn't available.


    I suppose the big question is, can this be prevented in future?

    I did lots of (possibly poor?) research and determined that raid6 would be a good fit for me. The chance of all my drives failing and being left with no home server whilst I'm not at home or available to fix things (WAF considered) was important. Recovery/rebuild times weren't important, just uptime. All drives have smart checks enabled with notifications, it's on a (diy...) ups that has never missed a beat, I check the array status regularly and get notified by HA if it is degraded. This has happened once in the last 2 years on an older drive, which mdadm recovered from easily.


    If it's just a symptom of my 'irregular' setup then I can understand it, but if this could have happened to any md raid6, then maybe I need to move to x86 and zfs ? At which point, I'd unfortunately have to at least test trueNAS :( .


    Any thoughts on the above would be greatly appreciated, alongside any assistance to attempt a recovery?!


    p.s. big thanks to the OMV community for getting me this far with my 'setup'. Sorry for the long post.

  • Raid details below:

  • I would suspect your mixed SSD/HDD array is probably the root cause of the problem. The general recommendations/rules for RAID stability are to use drives of the same same size, same make/manufacturer, same model, and same firmware, and to make sure that the drives are RAID rated and almost as important a recommendation is that the HDDs should be CMR and not be SMR drives. SMR as a rule is horrible for RAID use (horrible for anything that requires random read/erase/write activity actually). I know it isn't always possible to check all those boxes particularly if having to replace a drive that is a few years old, but the mixed drives do mean you are likely breaking all the "rules" (maybe with the exception of the manufacturer and if you're lucky the CMR/SMR rule).


    I have been running RAID 5 and RAID 1 in my system for 15 years and have never had an issue like you. I do however try to cycle to new drives every 5 to 7 years, with a complete data migration or backup restore to a new array. My only current issue is that I have one drive in my RAID 5 that has developed 8 reallocated sectors about 3 years ago at about half of it's 5 to 7 year active service life, but it has been stable at that same count ever since, so I am not rushing to replace the drive when I will probably be doing another full migration within the next year or so.


    I also have multiple RAID 6 arrays at the office, one of which (a single chassis sixty drive array - yes sixty as in 60) we replaced a few years ago after it ran for 10 to 12 years on the original drives (not one drive failure in the lifespan of the 60 drives, which I though was very impressive for that particular array), not to mention all the small 4 or 5 drive arrays I have placed in systems I used to build and sell for video editing, up until about 15 years ago when I worked as a pro A/V dealer.


    When the recommendations are followed, RAID can be very reliable. It's not a substitute for a backup though.


    As for RAID levels, a 4 drive RAID 6 makes no sense really. You loose the capacity of 2 drives to CRC checksums, so you have 50% capacity usable. If you are ok with 50% capacity, 2 RAID 1 arrays, (one from the HDD drives and one from the SSD drives), or stand alone drives doing scheduled rsync mirroring would have made more sense to me. You are still at 50% capacity, but you are not breaking as many of those RAID rules (don't know how many you are or would be breaking for sure since you never mentioned the make, model, firmware versions of the drives.


    If you want to use the drives as one volume, you could pool them with mergerfs or pool the masters if doing the rsync approach). I personally was never a fan of mergerfs, so I prefer to address my RAID 5 and RAID 1 as individual volumes, but many people use it.

    Edited once, last by BernH: fix typo ().

  • Thanks for the input BernH


    Yes, I have all the raid 'sins' here.

    All desktop grade storage...

    2 SMR seagate barracuda 2TB 2.5"

    1 SMR seagate barracuda 4TB 2.5"

    1 Crucual MX500 SSD 4TB 2.5"


    None of the drives are that old (in terms of operation hours - actual age is not clear). The oldest is 4 years (4TB HDD), the 2TB HDDs are 3 years and the SSD is less than 2 years.


    The array started because I had a lot of 2TB and 4TB 2.5" HDD's. I turns our the 4TB HDDS never lasted more than a few years so these have all faded away - gracefully though!


    My plan was to (within the next few years) have a complete SSD MX500 array, but crucial recently stopped production of this 4TB range so I was waiting for a reason to change and hadn't decided where to go. I guess this is the 'bump' I needed to make a change.


    I was aware that there were risks, hence backups are safe, checked and working (one of them is actually on a 2 disk raid mirror with the above 4TB HDD's), but I honestly didn't expect that one day it would almost all disappear without any warnings - I wouldn't have even noticed for a while if one of the shared folders hadn't completely disappeared.


    Any thoughts on the reason why this would happen? Other than the fact that I broke (all) the 'rules', it would be useful to know.


    Thanks.

  • The breaking of the "rules" is the only reason I can suggest really, but of those rules, my suspicion would point toward one or two potentials. Namely the SMR drives or the HDD/SSD mix. SMR would be my number one suspect by far.


    Here is the rationale for the "rules"


    RAID rated drives are the recommended norm for a couple of reasons. They often have firmware tweaks that make them work more reliably in a RAID setup and they are designed for 24/7 usage with lower MTBF.


    SMR drives are bad because the data tracks overlap, requiring that a change to one track needs to erase and re-write adjacent tracks (this is known to cause problems in RAID arrays and there have been similar strange data loss reports on the forum that were flagged as being caused by this)


    SSD and HDD operate at vastly different speeds. This difference could cause problems in a RAID Array because the drives have the potential to slip out of unison operations.


    I have seen arrays made from desktop drives before and from different makes/models. While not ideal, it's usually ok when not in 24/7 usage. I've also seen arrays made from different speed drives, and that is usually ok since the faster one will just slow down, but it is still a point to take note of. Even though you broke the "rules" on these two point, they are the least serious with regards to strange data loss like you report.

  • Thanks again BernH


    I'd still be interested to hear if anyone has had similar story, but I would consider this a lesson learned and fortunately not in a bad way.


    I've just taken the hit and ordered a batch of second hand ironwolf drives. I have a W10 PC that's about to go EOL, so I think I'll go x86 and maybe try zfs for a run this time over.


    Long may the journey continue!

  • There have been other posts on the forum where smr drives were deduced or been reported to cause problems with data loss on raids. You may find them with a bit of digging.


    I considered zfs when setting up my current build, but I opted not to use it for 4 reasons, the fist two are related..


    1) ZFS recommends ECC RAM which I do not have. It isn't a necessity and can run in non-ecc but, since ZFS relies on RAM for buffering caching it is a recommendation.


    2) ZFS uses a lot of RAM by default with a recommendation of 1GB of RAM for every TB of storage, and I didn't want to commit that much RAM to it.


    3) Familiarity. I have been running MD with XFS filesystems reliably for many years, so didn't feel an urgent need to change it.


    4) ZFS used to have issues with docker when I was considering it, but I believe it has been corrected.


    All that said zfs looks very attractive and there are people running zfs with non-ecc RAM. I'm sure they can attest to it's reliability if they see this thread or you can contact them or create a new post to query about it.

    • New
    • Official Post

    1) ZFS recommends ECC RAM which I do not have. It isn't a necessity and can run in non-ecc but, since ZFS relies on RAM for buffering caching it is a recommendation.


    2) ZFS uses a lot of RAM by default with a recommendation of 1GB of RAM for every TB of storage, and I didn't want to commit that much RAM to it.

    This has been discussed several times on the forum. It's actually a misconception that ZFS has special RAM requirements beyond what other systems require. The difference is whether or not you use deduplication, and 99.9% of OMV users will never use deduplication.


    By the way, crashtest posted a document on the wiki last night for using the openmediavault-zfs plugin. That might address point 3 you mentioned, and perhaps point 4. I haven't had a chance to read this document yet. https://wiki.omv-extras.org/do…?id=omv7:omv7_plugins:zfs

  • This has been discussed several times on the forum. It's actually a misconception that ZFS has special RAM requirements beyond what other systems require. The difference is whether or not you use deduplication, and 99.9% of OMV users will never use deduplication.


    By the way, crashtest posted a document on the wiki last night for using the openmediavault-zfs plugin. That might address point 3 you mentioned, and perhaps point 4. I haven't had a chance to read this document yet. https://wiki.omv-extras.org/do…?id=omv7:omv7_plugins:zfs

    I was only noting the recommendations from the zfs project. I did say they were recommendations and not necessities. A little digging in for information will reveal how to reduce the RAM usage also. I also did say that I believed the docker issue has been addressed and that I made these choices when I was assembling my current build which is now about 5 years old.


    I would consider it again on my next major change, but it still would mean I am more familiar with what I currently have, and have been running for 15 years.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!