OMV6 - encrypted drives keep randomly locking themselves?

  • My setup is now OMV 6 running on an 8GB Raspberry Pi 4, connected to a 4-drive bay that pools them together via mergerfs. Each of the 4 drives is encrypted via the LUKS plugin, and each time I boot it I need to unlock each one and then start up the mergerfs pool.


    Since upgrading to OMV 6, I've got this weird problem where the 4 drives will randomly be locked again. I'll notice because I'll try and access one of the things I have running in a docker container, and it just won't work. Then I go into OMV, and see that all 4 drives are locked. Once I unlock them and restart the mergerfs pool, and restart docker, it all works fine again, but it's very annoying and never happened on OMV 5.


    Any ideas as to what could cause my drives to lock themselves up at random times like that?

    • Offizieller Beitrag

    They are usb. They probably go to sleep. LUKS sees that as a detach and locks.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Is there any way to verify and/or fix this? I played around with the settings of my USB enclosure (and found the manual here), but it looks like the only time the enclosure triggers sleep mode is when it detects the computer it's plugged into (RPi) being off. Still, I turned off that sleep mode sync anyway, but it still keeps happening.

    Is there anything on OMV's side that could be triggering LUKS to think the drives are detaching?

    • Offizieller Beitrag

    Is there anything on OMV's side that could be triggering LUKS to think the drives are detaching?

    Nope. The LUKS plugin creates the container and is able to lock/unlock on demand. There is no service watching luks or doing anything with it. This is all LUKS/RPi/Linux/usb sata controller.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Nope. The LUKS plugin creates the container and is able to lock/unlock on demand. There is no service watching luks or doing anything with it. This is all LUKS/RPi/Linux/usb sata controller.

    Ah okay - so there's nothing other than my HDD bay that could communicate any message that LUKS might interpret as a drive being detached, not them spinning down through OMV settings or anything like that? Just want to make sure to try and narrow things down, and figure out why this didn't ever happen until I upgraded to OMV6...

    • Offizieller Beitrag

    Just want to make sure to try and narrow things down, and figure out why this didn't ever happen until I upgraded to OMV6...

    Before looking at "OMV6" as the potential culprit, remember that OMV is an application that is controlling some aspects of a new Linux Kernel and Userland. If you really want to look into it, the issue is likely with Debian 11.
    _______________________________________________________

    There's another way of looking at this. Do you really "need" LUKS? LUKS is protection from the physical removal (I.E. theft) of hard drives. In the vast majority of cases, server data is not compromised by physical theft. The vast majority of data compromises are done "over the network", after drives are unlocked.


    Assuming that you want to keep LUKS, perhaps a workaround to keep your USB drive enclosure from going to sleep is in order. Have you thought about using Scheduled Tasks to issue a touch command to each drive, every 15 minutes or so?

    • Offizieller Beitrag

    so there's nothing other than my HDD bay that could communicate any message that LUKS might interpret as a drive being detached, not them spinning down through OMV settings or anything like that?

    correct.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Before looking at "OMV6" as the potential culprit, remember that OMV is an application that is controlling some aspects of a new Linux Kernel and Userland. If you really want to look into it, the issue is likely with Debian 11.
    _______________________________________________________

    There's another way of looking at this. Do you really "need" LUKS? LUKS is protection from the physical removal (I.E. theft) of hard drives. In the vast majority of cases, server data is not compromised by physical theft. The vast majority of data compromises are done "over the network", after drives are unlocked.


    Assuming that you want to keep LUKS, perhaps a workaround to keep your USB drive enclosure from going to sleep is in order. Have you thought about using Scheduled Tasks to issue a touch command to each drive, every 15 minutes or so?

    Oh yeah makes sense, and I'm not trying to blame OMV6! I was just hoping to use that to narrow down the source of the issue, That makes sense though, it could definitely be something with Debian 11. I'll search around and see if anyone else has run into sleepy drives with it like this.

    As far as your other question goes, unfortunately the answer is that I probably don't need LUKS, but at the moment I'd have no way out, since I don't have any spare drives big enough to transfer these files off of as I decrypt these drives, so I think I'm stuck for now.

    Good call about a scheduled task, for now I'll set that up real quick and then see how things go. Thank you for the idea!

  • Assuming that you want to keep LUKS, perhaps a workaround to keep your USB drive enclosure from going to sleep is in order. Have you thought about using Scheduled Tasks to issue a touch command to each drive, every 15 minutes or so?

    So after setting up some scheduled tasks last Wednesday when I posted my previous reply, it seemed to be working well until today, when I found my drives all locked again. I think the scheduled tasks were continuing though, since the file I set to be touched every 15 minutes was still there at /srv/dev-disk-by-uuid... with a recent modified date, somehow even though the drive was locked.


    I don't suppose seeing that the drives still lock even with constant activity helps narrow anything down, does it? It's probably just a Debian 11 issue, so I should live with it as best I can?

  • Maybe do the touch every 10 minutes? If you're OK with drives spinning (I am) try 5 minutes.

    Just curious, are you using an UPS?

    I'm good with drives spinning - at least if I can figure this out, I can then see how much I can scale back the drive activity while still keeping them decrypted. Just changed it down to 5 minutes so we'll see how that works! Definitely though one day, I'll get ahold of some more storage, and I'll start moving things out to get these drives unencrypted 🤦‍♂️.


    Nope not using a UPS - think it might be dips in power causing me to lose the drives? I think I have one laying around, might just need a new battery, but I could give that a shot too.

    • Offizieller Beitrag

    Nope not using a UPS - think it might be dips in power causing me to lose the drives?

    It's possible. There are a great many power line effects that we, as customers, don't see. The question is, what is the tolerance of the device? I've see this first hand. I had a commercial server and a workstation on line, at the same time, sitting side by side. I saw a very short but noticeable power hit where the workstation reset and started rebooting. The server was completely unaffected. I believe the difference was due to a much higher quality power supply in the server.

    There are power transients that are fast enough that we don't see them with the naked eye, but they may affect our devices. (Especially those tiny little switcher power cubes that have the bare minimum of components.) If your drive enclosure PS is more sensitive than your RPI PS, the RPI may be seeing a disconnect from the drives. Admittedly, this is a guess.

    If you can, I believe your best path would be a large drive that would let you get out of LUKS. Maybe a self powered external drive that's independent of the enclosure. (Which is a single point of failure.) You need backup in any case. You're taking a risk without it.

  • It's possible. There are a great many power line effects that we, as customers, don't see. The question is, what is the tolerance of the device? I've see this first hand. I had a commercial server and a workstation on line, at the same time, sitting side by side. I saw a very short but noticeable power hit where the workstation reset and started rebooting. The server was completely unaffected. I believe the difference was due to a much higher quality power supply in the server.

    There are power transients that are fast enough that we don't see them with the naked eye, but they may affect our devices. (Especially those tiny little switcher power cubes that have the bare minimum of components.) If your drive enclosure PS is more sensitive than your RPI PS, the RPI may be seeing a disconnect from the drives. Admittedly, this is a guess.

    If you can, I believe your best path would be a large drive that would let you get out of LUKS. Maybe a self powered external drive that's independent of the enclosure. (Which is a single point of failure.) You need backup in any case. You're taking a risk without it.

    The plot thickens, I think I just found another detail that might narrow things down! I previously had snapraid syncing periodically with this snapraid-aio-script, and I just set it up again on OMV6. I started running it, and its first step was to run snapraid diff to see how many changed files I had. I have a ton since it's the first time it's syncing in a long time, and the file changes it's finding are flying up the terminal for a while, before they suddenly stop. My ssh session gets interrupted, and when I navigate to the OMV webui it sometimes takes a bit of refreshing to come up, as if it just restarted. Then the drives are encrypted again.


    Is it possible that under load, it all just restarts, and maybe that's been happening every so often all this time? And maybe that ties into something new with Debian 11 perhaps?

    • Offizieller Beitrag

    Reboot the R-PI. Wait for the lock again. (Maybe encourage it with disk I/O, if that's what causes the lock.)
    Then, in an SSH session, do dmesg --time-format=iso.
    dmesg is the kernel log. Scroll through the output and look for errors.

  • Okay yeah, I can now confirm that the disk I/O (at least caused by Snapraid) indeed reliably causes the drives to lock again every time. I took a peek at that log, but nothing super obvious jumped out at me. There were a couple lines my terminal highlighted in red, that showed up a few times:

    Code
    2023-01-31T14:19:20,424709-05:00 sd 0:0:0:1: [sdb] No Caching mode page found
    2023-01-31T14:19:20,424729-05:00 sd 0:0:0:1: [sdb] Assuming drive cache: write through

    I hate to dump the full log on you, but here's the entire output as well, just in case: https://pastebin.com/yFjtA5Mh

    • Offizieller Beitrag

    The log is a few minutes long. Did the drives lock during this period?

    Are you storing Dockers and images on a MergerFS array? Dockers shouldn't be housed in the MergerFS mount point. Dockers use a version of overlayfs that is not entirely compatible with MergerFS. Further, Dockers are constantly manipulating container files (not unlike the activity that's normal for a boot drive).

    The issue can be worked around by installing Dockers on a single drive's mount point. If you're using Most Free Space, your drives will balance out over time but, after installing to a single Drive's mount point, note that you can't use the "Balance Tool" (in the MergerFS plugin).

  • The log is a few minutes long. Did the drives lock during this period?

    Are you storing Dockers and images on a MergerFS array? Dockers shouldn't be housed in the MergerFS mount point. Dockers use a version of overlayfs that is not entirely compatible with MergerFS. Further, Dockers are constantly manipulating container files (not unlike the activity that's normal for a boot drive).

    The issue can be worked around by installing Dockers on a single drive's mount point. If you're using Most Free Space, your drives will balance out over time but, after installing to a single Drive's mount point, note that you can't use the "Balance Tool" (in the MergerFS plugin).

    I believe they did lock during this period - but to be honest, I'm not sure if that's all that happens, or if the whole system restarts. It might be that the drives end up locked because the system restarted, so I only have the logs from that point on.


    Good question about Docker and MergerFS - sort of. At one point when I moved off of my limited storage internal microsd card, I first tried moving docker and its images to my mergerfs pool but that gave me issues for all the reasons you described. So I just moved it all onto a single drive, but a drive that's still part of the mergerfs pool. So that should still be safe, right? Good to know about not using "Balance Tool", didn't realize that but don't think I've used it at all fortunately.

    • Offizieller Beitrag

    Before digging any farther into this, are you booting from an SD-card? If so, what brand of SD-card are you using, how old is it and do you have a back? Odd behavior is a hallmark of a marginal SD-card.
    ________________________________________________

    So I just moved it all onto a single drive, but a drive that's still part of the mergerfs pool. So that should still be safe, right?

    That should be OK. Again if you're using Most Free Space, over time given normal additions and deletes, data should balance around what Docker is using on the single drive.

    However, -> this is what I find concerning. This is why I wanted to look at the output of dmesg. On the other hand, if there's a kernel freeze / panic, I'm not sure what happens to the dmesg log.
    ________________________________________________


    Reboot again, wait for a LUKS lock event and run the following:

    journalctl -k >kernel.txt

    That will dump a text file into the root user's folder. Move it to one of your shares and post it on-line.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!