Inactive disks go "offline" and are unable to be mounted until they are power cycled

  • I'm having an incredibly odd issue. It seems that if I have a drive that isn't being accessed for a relatively short period of time, less than an hour, it will go offline and will not be able to be mounted again until it is power cycled. They still show up in disks and SMART, but they are shown as unknown status.


    All my drives are connected to a LSI9305-24i, I've also tried a 9300-8i with a HBA expander, various cables, different ports, everything I can think of to eliminate a hardware issue. This also seems to be isolated to SAS drives.


    Drives that are being read from or written to seem to stay online without issue.



    ~45 minutes later


    This is the SMART data from two identical drives in the same enclosure, right next to each other. One is part of a MergerFS pool, the other is not part of a pool or otherwise referenced in the system


    Once they are power cycled, either by a reboot or just hot plug, the process repeats. I'd love to get to the bottom of this.

  • Sure thing, this is the same drive as /dev/sdt above


  • Here you go, this is immediately after a hot plug.


  • OK, that didn't give us any new insight, other than your drive is in pristine shape. ;)


    I'm beginning to suspect that the sleep issue has something to do either with the drive's firmware (unlikely, probably) or with OMV's settings related to APM. It should be set to either 254 or 'disabled' - can you check?

  • do you try to edit the disk?:



    try to change spindown time one time and revert to disable so webgui must apply changes.


    if is show disabled nothing is saved to disk until you change it to other value, then apply, and revert to disable, and then apply

  • I've never enabled power management on any of my drives, I'll try cycling one as raulfg3 suggested, but I'm not going to hold my breath as this issue is more recent than some of the drives have been in use in my system.



    To give more insight, as you can see in the original post this issue isn't isolated to a single drive, but at least 6. There are 8 SAS drives in the system, two matched triplets and a matched pair. The really odd part of all of this, is that one of the drives in the pair and one of the drives from a triplet behave fine, the only difference with them is that they are mounted, part of a mergerfs pool, and are being read from or written to often.



    It seems like read/write is what's keeping these alive, I tried pooling the 3TB triplet, but it wasn't actively in use, and it dropped out overnight. I have a SATA drive that's idle and does not exhibit this behavior.

  • do you try to edit the disk?:


    try to change spindown time one time and revert to disable so webgui must apply changes.


    if is show disabled nothing is saved to disk until you change it to other value, then apply, and revert to disable, and then apply

    Disks still dropped out after trying this.

  • It's possible you've mentioned it already, but how long does it takes for your drives to shut down and become inaccessible?


    I wonder if a really short SMART monitoring interval (say, 2 or 3 minutes) would keep them alive?


    This really is weird...


    I'm thinking that since this started after the upgrade to OMV 8, it must be something in Debian 13 that changed from 12 as it relates to SCSI/SAS support. That's just a guess, of course. Maybe something in the current kernel?

  • Less than an hour, maybe somewhere around 30 minutes give or take. I've restored an OMV7 backup and drives have not dropped out yet after 4+ hours.


    It's relatively easy for me to change between 7 and 8 with backups, so if someone has a test protocol they'd like me to run, I'm willing.

  • Apparently there's a utility called sdparm which is similar to hdparm but geared toward SCSI/SAS drives. This can be used to control a drive's internal power management and sleep timers, which seems to be more comprehensive that what SATA drives usually support.


    SAS host adapter cards also have their own power management settings as well.


    It would be worth looking into your BIOS/UEFI settings for anything related to PCIE link power management, or access to the SAS adapter's firmware if possible.


    I'm completely tapping in the dark here, but just trying to give some more ideas...

  • Since reverting back to OMV7 the system has been stable, and no drives have dropped out.


    The only change that was made was the upgrade from OMV7 to OMV8, all other hardware and machine settings were not touched, so I think you're right that something in Trixie has seriously changed at a fundamental level.


    Anyone on the OMV development team want to guide me through some troubleshooting steps?

  • I don't suppose anyone has some next steps for me, do they? I'd like to get back to OMV8, but if this is going to be a persistent issue I'm afraid I'll be stuck on 7 for at least one release cycle.

  • I know, I was hoping to hear from one of the devs on it. I don't need to use either of those utilities to maintain a stable system on OMV7, so arguably I shouldn't need them on OMV8 either.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!