Beiträge von go_niko

    Try

    omv-salt deploy run collectd fstab mergerfs monit quota

    Thank you, that seems to have worked! I'll add it to my script.


    Try

    systemctl restart doker.service

    This seemed to indeed restart docker, but when I pop into portainer, some containers are running, while others are marked "exited", so I still need to go into each stack manually and hit stop, and then start on the whole stack to get them going. Any other way to automate that piece?

    So I've been having some issues with my drives encrypting themselves lately (had asked about that in this thread), and I'm thinking the solution will be eventually to just remove my LUKS disk encryption, and simply have encrypted containers for particular things I'd like to secure.


    That said, I don't yet have a way to back up everything from these drives so I can format them, in the meantime I just want to run a bash script cron job that attempts to decrypt each of my encrypted drives with a hardcoded password, and if any do end up being decrypted (meaning they had become randomly re-encrypted again), it also restarts the mergerfs pool afterwards, since things don't seem to work until I do.


    I only describe all of the above before people yell at me that including the password and automating this script takes away all the security benefit of disk encryption in the first place.


    Anyway, I have the first part working and my bash script unlocks my drives. Now for the next part, how do I restart my mergerfs pool from a bash script?


    While we're at it, is there a way to stop and start each Portainer stack too? I know how to do it per-container, but it'd be cool to start the entire docker-compose stack.

    Update to this issue: while I still haven't procured a drive big enough to transition away from encrypted drives, but I did switch my boot drive over from an SD card to a SSD.


    For the most part everything seems to work better and snappier, and I haven't had the issue with drives randomly locking quite as much... it still happens from time to time though, unfortunately, so I suppose the next possible source of issues to check would be the HDD bay?


    Also, in case it helps locate the source of the issue, a strange thing happens each time I go through the decrypting process. After unlocking each drive, just to be safe I go over to the mergerfs tab, click on my pool, and hit the button to restart the pool, and then I immediately get thrown to the OMV "Software Failure" page, telling me that the requested page was not found, and to click the left mouse button to continue. After that, everything seems to work fine, but it does seem something fishy is going on.

    Hmmm okay, yeah maybe I'll just start working on that since that'll be the long-term plan anyway.


    Logistics-wise, would the strategy be to get a drive at least as big as my biggest drive, have all the files copy over via rsync, format the original drive, copy the files back to it, and repeat with each of the other 3 drives?

    It's not dicey. If you want to boot on a different SD card, as a test, the destination card must be identical in size or larger. However, if you clone to a larger card and decide to to stay with it, the same rule applies. When cloning the larger card, the destination much be identical in size or larger. (This is why having at least two identical cards is a good idea.)

    Ahh okay makes sense! I don't know why I was thinking they needed to be exactly the same size - just needs to be the same or bigger, that makes sense. I'll probably pick up another identical card then.


    Besides that though, any other ideas on troubleshooting this? Or is that pretty much my best bet for now?

    Another question then... if going between different SD cards can be so dicey, what's the best way for me to try booting off of a different one? Do I just have to try and find the same exact make and model as my current one, or is there a different way to flash between them that won't get thrown off by a single bit difference?

    Also is there anything at all I may be able to try in the meantime, or does it seem like my boot SD card is the only thing that could be responsible? I definitely don't want to, but I'm thinking I might just try going back to OMV 5 to see if the issue was indeed with OMV 6, or if that was just coincidence. It would be great not to have everything breaking every few hours 🤦‍♂️

    Ahh didn't realize that! That's probably what's happening here then... I just tried with win32diskimager and that actually worked, buuut then when I tried to boot my Raspberry Pi from it, nothing would come up. Once I put the original sd card in with the same image, it did come up again.


    So that's a bummer, that I can't give it a proper try to see if the sd card is really at fault for my drives constantly locking themselves. I'll keep poking around, in case I can find out if there's anything else that might be causing that to happen.

    I pulled my SD card out and found that I'm currently using a 32 GB class 10 Samsung SDHC card. I followed those directions to test it, and it came back fine with no errors. I also found I had a separate Sandisk 32 GB card, and when testing that it also came up without any errors, but alas, each time I try to write my backup image to it with Rufus, it fails.


    Yeah the capture is the tough part, I don't know how to catch the crash when it happens... even though I can reliably cause it with Snapraid, it's still a bit random as to when exactly it happens. Is there anything else you can think of trying, before I need to go out and buy a new sd card? Or maybe a better way to manage to log what happens in the crash?

    Okay good to hear as far as my mergerfs setup goes. Good question about my sd card though - yes I'm booting from an SDHC class 10 microsd card, and I've had it for a while. Can't check exactly what kind it is at the moment, but I will when I get a chance. I might see if there's something I can use to check its health. But I do have it backed up if I end up needing a new one.


    Here's my kernel.txt file, after the most recent crash. Hopefully I got the log soon enough after it happened to have some useful information on there. This time the crash happened organically; I didn't invoke it by running that snapraid script.

    The log is a few minutes long. Did the drives lock during this period?

    Are you storing Dockers and images on a MergerFS array? Dockers shouldn't be housed in the MergerFS mount point. Dockers use a version of overlayfs that is not entirely compatible with MergerFS. Further, Dockers are constantly manipulating container files (not unlike the activity that's normal for a boot drive).

    The issue can be worked around by installing Dockers on a single drive's mount point. If you're using Most Free Space, your drives will balance out over time but, after installing to a single Drive's mount point, note that you can't use the "Balance Tool" (in the MergerFS plugin).

    I believe they did lock during this period - but to be honest, I'm not sure if that's all that happens, or if the whole system restarts. It might be that the drives end up locked because the system restarted, so I only have the logs from that point on.


    Good question about Docker and MergerFS - sort of. At one point when I moved off of my limited storage internal microsd card, I first tried moving docker and its images to my mergerfs pool but that gave me issues for all the reasons you described. So I just moved it all onto a single drive, but a drive that's still part of the mergerfs pool. So that should still be safe, right? Good to know about not using "Balance Tool", didn't realize that but don't think I've used it at all fortunately.

    Okay yeah, I can now confirm that the disk I/O (at least caused by Snapraid) indeed reliably causes the drives to lock again every time. I took a peek at that log, but nothing super obvious jumped out at me. There were a couple lines my terminal highlighted in red, that showed up a few times:

    Code
    2023-01-31T14:19:20,424709-05:00 sd 0:0:0:1: [sdb] No Caching mode page found
    2023-01-31T14:19:20,424729-05:00 sd 0:0:0:1: [sdb] Assuming drive cache: write through

    I hate to dump the full log on you, but here's the entire output as well, just in case: https://pastebin.com/yFjtA5Mh

    It's possible. There are a great many power line effects that we, as customers, don't see. The question is, what is the tolerance of the device? I've see this first hand. I had a commercial server and a workstation on line, at the same time, sitting side by side. I saw a very short but noticeable power hit where the workstation reset and started rebooting. The server was completely unaffected. I believe the difference was due to a much higher quality power supply in the server.

    There are power transients that are fast enough that we don't see them with the naked eye, but they may affect our devices. (Especially those tiny little switcher power cubes that have the bare minimum of components.) If your drive enclosure PS is more sensitive than your RPI PS, the RPI may be seeing a disconnect from the drives. Admittedly, this is a guess.

    If you can, I believe your best path would be a large drive that would let you get out of LUKS. Maybe a self powered external drive that's independent of the enclosure. (Which is a single point of failure.) You need backup in any case. You're taking a risk without it.

    The plot thickens, I think I just found another detail that might narrow things down! I previously had snapraid syncing periodically with this snapraid-aio-script, and I just set it up again on OMV6. I started running it, and its first step was to run snapraid diff to see how many changed files I had. I have a ton since it's the first time it's syncing in a long time, and the file changes it's finding are flying up the terminal for a while, before they suddenly stop. My ssh session gets interrupted, and when I navigate to the OMV webui it sometimes takes a bit of refreshing to come up, as if it just restarted. Then the drives are encrypted again.


    Is it possible that under load, it all just restarts, and maybe that's been happening every so often all this time? And maybe that ties into something new with Debian 11 perhaps?

    Maybe do the touch every 10 minutes? If you're OK with drives spinning (I am) try 5 minutes.

    Just curious, are you using an UPS?

    I'm good with drives spinning - at least if I can figure this out, I can then see how much I can scale back the drive activity while still keeping them decrypted. Just changed it down to 5 minutes so we'll see how that works! Definitely though one day, I'll get ahold of some more storage, and I'll start moving things out to get these drives unencrypted 🤦‍♂️.


    Nope not using a UPS - think it might be dips in power causing me to lose the drives? I think I have one laying around, might just need a new battery, but I could give that a shot too.

    Assuming that you want to keep LUKS, perhaps a workaround to keep your USB drive enclosure from going to sleep is in order. Have you thought about using Scheduled Tasks to issue a touch command to each drive, every 15 minutes or so?

    So after setting up some scheduled tasks last Wednesday when I posted my previous reply, it seemed to be working well until today, when I found my drives all locked again. I think the scheduled tasks were continuing though, since the file I set to be touched every 15 minutes was still there at /srv/dev-disk-by-uuid... with a recent modified date, somehow even though the drive was locked.


    I don't suppose seeing that the drives still lock even with constant activity helps narrow anything down, does it? It's probably just a Debian 11 issue, so I should live with it as best I can?

    Before looking at "OMV6" as the potential culprit, remember that OMV is an application that is controlling some aspects of a new Linux Kernel and Userland. If you really want to look into it, the issue is likely with Debian 11.
    _______________________________________________________

    There's another way of looking at this. Do you really "need" LUKS? LUKS is protection from the physical removal (I.E. theft) of hard drives. In the vast majority of cases, server data is not compromised by physical theft. The vast majority of data compromises are done "over the network", after drives are unlocked.


    Assuming that you want to keep LUKS, perhaps a workaround to keep your USB drive enclosure from going to sleep is in order. Have you thought about using Scheduled Tasks to issue a touch command to each drive, every 15 minutes or so?

    Oh yeah makes sense, and I'm not trying to blame OMV6! I was just hoping to use that to narrow down the source of the issue, That makes sense though, it could definitely be something with Debian 11. I'll search around and see if anyone else has run into sleepy drives with it like this.

    As far as your other question goes, unfortunately the answer is that I probably don't need LUKS, but at the moment I'd have no way out, since I don't have any spare drives big enough to transfer these files off of as I decrypt these drives, so I think I'm stuck for now.

    Good call about a scheduled task, for now I'll set that up real quick and then see how things go. Thank you for the idea!

    Nope. The LUKS plugin creates the container and is able to lock/unlock on demand. There is no service watching luks or doing anything with it. This is all LUKS/RPi/Linux/usb sata controller.

    Ah okay - so there's nothing other than my HDD bay that could communicate any message that LUKS might interpret as a drive being detached, not them spinning down through OMV settings or anything like that? Just want to make sure to try and narrow things down, and figure out why this didn't ever happen until I upgraded to OMV6...

    Is there any way to verify and/or fix this? I played around with the settings of my USB enclosure (and found the manual here), but it looks like the only time the enclosure triggers sleep mode is when it detects the computer it's plugged into (RPi) being off. Still, I turned off that sleep mode sync anyway, but it still keeps happening.

    Is there anything on OMV's side that could be triggering LUKS to think the drives are detaching?