Heavy disk I/O causes LUKS volume to unmount (not an OMV issue, mitigated)

  • Apparently whenever my weekly SnapRAID sync job configured via the SnapRAID plugin runs, it causes one LUKS volume on a USB HDD to unmount.

    The LUKS volume in question (A) is part of a mergerFS pool with another, smaller drive (B). SnapRAID is set up to sync both data drives (A & B) to a third parity drive (C).


    I found this thread describing pretty much the same issue with no apparent solution. According to info in this thread, the issue might be that the drive goes to sleep or gets disconnected because of disk I/O from the SnapRAID sync. But since the drive seems to work reliably in regular operation (frequent disk I/O), this doesn't seem very plausible to me.


    Similar to their issue, I consistently get the software issue page from OMV after manually reloading the mergerFS pool after manually unlocking the LUKS volumes after a reboot; however, after simply reloading the page the pool works as expected and shows up correctly in the OMV webGUI.



    However, I found these log entries:


    journalctl -o short-precise -k

    Which seem to point to disconnect issues on that drive (sde = A, the drive causing problems). I'm also a bit confused regarding the errors for sdc, my mounts at the moment are sda, sdb, sdd and sde.

    LUKS is setup to decrypt sdb, sdd and sde. I'm not sure why one of the drives wasn't automatically mounted as sdc in the first place. After a reboot, the drives mount as sda,b,c and d as expected, so I currently regard this as a side effect of the frequent disconnects and remounts.



    I have OMV6 installed on a small x64 PC with 3 externally powered USB HDDs connected and a SATA SSD as boot drive. A few weeks ago, I scrapped my old OMV6 (software) setup due to boot drive failure and started from scratch (complete reinstall, no restoration of backup) with the exact same hardware setup (except a new boot drive). This wasn't an issue before with pretty much the identical setup.



    Does anybody have an idea what the issue here could be?

  • Update 1: Connected all 3 USB HDD through an (unpowered) USB HUB to one USB port on the PC, when running SnapRAID check, I got I/O errors and all 3 USB HDDs vanished from the disk page in the OMV webGUI. This showed up in the journal:

    kernel.txt


    This is causing me serious worry:

    Code
    Okt 27 12:23:31 OMV kernel: hub 2-4:1.0: over-current condition


    Connecting all 3 USB drives to the other 3 ports and retesting, maybe the port in question is simply defective.


    With all 3 drives connected to different ports (except the one where I got the over-current condition on) SnapRAID check is currently running for a lot longer than previously, though I still get much more infrequent errors:


    Code
    Okt 27 12:48:07 OMV kernel: mce: CMCI storm detected: switching to poll mode
    Okt 27 12:50:25 OMV kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
    Okt 27 12:50:25 OMV kernel: sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
    Okt 27 12:50:25 OMV kernel: sd 2:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 d1 dd 58 00 00 00 02 00 00 00
    Okt 27 12:50:25 OMV kernel: I/O error, dev sdc, sector 3520944128 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 2
    Okt 27 12:54:26 OMV kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
    Okt 27 12:54:26 OMV kernel: sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
    Okt 27 12:54:26 OMV kernel: sd 2:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 b7 4d 0a 00 00 00 02 00 00 00
    Okt 27 12:54:26 OMV kernel: I/O error, dev sdc, sector 3075279360 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 2
    Okt 27 12:54:55 OMV kernel: perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79500

    _______________________________________________________________________________________________________________________

    The journey continues.


    Now I got an USB reset and an I/O error on a different disk. Which makes me think that the issue isn't the one drive, but simply was a symptom due to that specific drive getting a huge I/O spike before the other drives when SnapRAID runs a check:

    Code
    Okt 27 13:09:55 OMV kernel: mce: CMCI storm subsided: switching to interrupt mode
    Okt 27 13:11:42 OMV kernel: usb 2-5: reset SuperSpeed USB device number 4 using xhci_hcd
    Okt 27 13:11:43 OMV kernel: sd 4:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
    Okt 27 13:11:43 OMV kernel: sd 4:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 1c 36 2e 00 00 00 02 00 00 00
    Okt 27 13:11:43 OMV kernel: I/O error, dev sdd, sector 473312768 op 0x0:(READ) flags 0x80700 phys_seg 50 prio class 2


    I was thinking that maybe the storage driver, in particular UAS, plays a role here, but this is my current USB HDD setup via lsusb -t:

    Code
    /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/7p, 5000M
        |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 2: Dev 3, If 0, Class=Mass Storage, Driver=uas, 5000M
        |__ Port 5: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M

    Ironically, the only drive not throwing any errors (so far, that is) is the one with the UAS driver.

    • Offizieller Beitrag

    Seems like a motherboard issue causing a usb disconnect that locks the drive. Snapraid is probably just causing lots of activity that is causing the disconnect. All I can think is you should try a different kernel like the proxmox 6.2 or 5.19 kernel.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4 | scripts 7.0.1


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Seems like a motherboard issue causing a usb disconnect that locks the drive. Snapraid is probably just causing lots of activity that is causing the disconnect. All I can think is you should try a different kernel like the proxmox 6.2 or 5.19 kernel.

    Thank you for the info, I really do hope that this is fixable or at least mitigable via the software route. I'll "abuse" the SnapRAID check for increased I/O for a while longer and monitor the log, so far the CMCI storms, USB resets and I/O errors don't cause the disks to show up as disconnected and the LUKS volumes are still mounted.

    If that happens again, I'll try upgrading to the proxmox kernel, retest and report back.

    • Offizieller Beitrag

    I really do hope that this is fixable or at least mitigable via the software route

    I don't think there is anything I can do from the snapraid or luks plugin side to help. If usb is disconnecting due to activity, that feels like a hardware issue and/or driver bug in the kernel. I could be wrong but I don't have any other ideas.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4 | scripts 7.0.1


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • I don't think there is anything I can do from the snapraid or luks plugin side to help. If usb is disconnecting due to activity, that feels like a hardware issue and/or driver bug in the kernel. I could be wrong but I don't have any other ideas.

    Oh I didn't think so, I meant that I do hope that another kernel somehow can mitigate those issues that seem to be a hardware issue. This is very clearly not an issue that is caused by the LUKS or SnapRAID plugin or OMV in general.

  • SignedOne

    Hat den Titel des Themas von „SnapRAID Sync causes LUKS volume to unmount“ zu „SnapRAID Sync causes LUKS volume to unmount (not an OMV issue)“ geändert.
  • Unfortunately, the issue persists even on the 6.2 proxmox kernel. I don't know if it's simply bad luck or it somehow corrected itself before, but now even "regular" high I/O, e.g. writing a backup to a USB HDD, results in resets, I/O errors and the vanishing of all USB HDDs.

    • Offizieller Beitrag

    I would try an older kernel too. If a buggy change was made, it might be in many kernels after the change.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4 | scripts 7.0.1


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Switched to proxmox 5.15; thank you for your continued support, despite this being a generic linux issue. Is it alright if I keep updating this thread or would you prefer me to post somewhere else, since this isn't really an OMV issue?


    That alone didn't help, but in my search for the root of these issues, I disabled USB autosuspend via kernel parameters in /etc/default/grub.

    Now I get these messages consistently when a reset and I/O error happens:


    Code
    Okt 27 21:33:51 OMV kernel: usb 2-5: Disable of device-initiated U1 failed.
    Okt 27 21:33:56 OMV kernel: usb 2-5: Disable of device-initiated U2 failed.
    Okt 27 21:33:56 OMV kernel: usb 2-5: reset SuperSpeed USB device number 4 using xhci_hcd
    Okt 27 21:33:56 OMV kernel: sd 4:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=10s
    Okt 27 21:33:56 OMV kernel: sd 4:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 06 31 aa 57 f0 00 00 08 00 00 00
    Okt 27 21:33:56 OMV kernel: blk_update_request: I/O error, dev sdc, sector 26603050992 op 0x0:(READ) flags 0x80700 phys_seg 78 prio class 0

    It looks like the USB HDDs ignore the autosuspend setting, and this may either be the cause or a symptom of the issue. I found a resource suggesting adding "usbcore.quirks=VID:PID:k" as a kernel parameter to try to force the disabling of autosuspend, I'll try that next.


    _______________________________________


    The resets and I/O errors still happen after adding "usbcore.quirks=VID:PID:k" for all USB HDDs to the kernel parameters, but the operations on the disk don't get interrupted and the disks don't get unmounted. SMART values all seem to be ok and the system seems to be kind of stable now at least. Not ideal in any way and I'm still confused on what brought these issues on in the first place.

    • Offizieller Beitrag

    Is it alright if I keep updating this thread or would you prefer me to post somewhere else, since this isn't really an OMV issue?

    I find it interesting and others could benefit as well. So please do.


    I'm still confused on what brought these issues on in the first place.

    Hard to say. Most Linux drivers are written generically and some motherboards have buggy implementations of hardware.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4 | scripts 7.0.1


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Hard to say. Most Linux drivers are written generically and some motherboards have buggy implementations of hardware.

    That makes sense, but apart from a recent reinstall and the switch from an internal NVME SSD to a SATA SSD nothing regarding my setup really changed. Same software, same architecture (USB HDDs, mergerFS and SnapRAID on top of LUKS encrypted drives), same cables, same powersupplies, same HDDs.


    I didn't monitor the log before the reinstall and even on the new setup, they system at least didn't lock up the drives, this all manifested in the last two weeks.



    Currently though I'm happy to report that the system is stable. The resets and I/O errors get logged frequently and are somewhat concerning, but the drives and LUKS volumes stay mounted despite some heavy-ish IO activity in the last 24 hours.


    I might try to go back to a 6.x kernel later on, since the usbcore.quirks=VID:PID:k hack seems to have been the key, apart from the CMCI storms that for some reason disappeared on the 5.15 kernel or at least don't get reported anymore.

  • SignedOne

    Hat den Titel des Themas von „SnapRAID Sync causes LUKS volume to unmount (not an OMV issue)“ zu „Heavy disk I/O causes LUKS volume to unmount (not an OMV issue, mitigated)“ geändert.
  • I also have a lot USB hdd issues. I bought a Seagate Deskdrive which sometimes worked at the USB 2.0 port. But very likely failed on USB 3.0.


    But lately also on USB 2.0 I see issues. Then I shucked the drive and bought a Inoteck USB 3.0 SATA adapter. Same problem here.


    My personal issue is a bit that I dont know how to monitor or debug all these situations. I just see EOF errors while copying files via my docker container "cloud commander". And I see io errors in dmesg.


    Summary is...all these USB drives perfectly work on my windows machine, but rarely to never work on my Intel board running openmediavault/Debian with kernel 6.1 (...and also 5.19).


    Edit: I stopped plugging the external drives into the server to move data. I am using a notebook wired to local network and transfer the data via shared folders. I get better speed, no interruptions and no data corruptions.

    cpu Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz
    omv 6.9.13-1 (Shaitan)

    kernel 6.1.0-0.deb11.11-amd64

    Einmal editiert, zuletzt von godfuture ()

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!