OMV freezing at random

  • Hi there,


    I recently upgraded from OMV 2 to OMV 4. Upgrade went fine, and I migrated from a RAID setup to Union Filesystem (all data backed up to external drives). Since the upgrade, every now and then, I lose connectivity to the filesystem/folders (can't access from media players, or networked PC). I've seen this happen when watching media once ... the movie slowed down and then froze, but most often than not the issue occurs at random times, and at different intervals (sometimes it can be a few days, other times it happens on successive days).


    When this occurs I seem to still be able to access the GUI, and there's no obvious spiking of RAM or CPU reported, but trying to access disk details in the GUI grinds to a halt. My media server (HP n40l, with 6 x 4TB HDD) has one disk with a reported SMART error (22 reallocated sectors), which needs to be swapped out and replaced. I've had multiple disk failures over the years, and have never experienced the issue I've described.


    From personal experience, does anyone have any suggestions on what the issue could be? I had no problems when running OMV 2 (but I also wasn't using Union FS then, either). Which logs would I need to provide in order to analyse the situation further, and could I please get some guidance on how to extract the relevant logs.


    Thanks in advance for any and all support!
    Brian

  • I have N54L with 6 drives but only 4 are part of mergerfs (unionfs), I take it 4 of the drives are in the bays how are the other two connected? What mergerfs option are you using? EP, MFS, LFS. What's your boot device?

    The fact that one drive is showing some errors this could be the problem.

  • Thanks for your reply geaves. Drive 5 is in the ODD drive bay and Drive 6 is connected via the external eSata port. All worked well in the old RAID5 array.


    MergerFS is now set to MFS. I say now, as when I first transitioned to UFS my first transfer all but filled one drive (6gb free) but I then switched to MFS and all other drives are now spreading the load.


    Boot drive is a 120gb SSD connected to the onboard USB. Again, I’ve always used this configuration, though it was previously a mechanical drive. When I rebuilt the system I switched to an SSD. It later developed a SMART error. Disc cloned to a new SSD, error free. Flashmemory plug-in in use.

  • Drive 5 is in the ODD drive bay and Drive 6 is connected via the external eSata port.

    Same set up, except the ODD drive is used for rsync and the eSata is used for docker configs, but I use a Flash Drive to boot from the internal USB.

    I also moved from Raid 5 to mergerfs and snapraid, at times I can have as many as four streams running without an issue, so I'm guessing that your problem is hardware related, either the drive showing errors or some intermittent hardware. The only log that might display something is the syslog and you could be looking for I/O errors, but again guess work and process of elimination.

  • I'm beginning to think it's hardware related. Access to the UFS froze again yesterday; I rebooted as usual, but during the boot up process I received a couple of error notifications that 2 of the 6 drives hadn't been mounted (Status failed Service mountpoint_srv_dev-disk-by-label-WD4TBX is not a mountpoint). I can't see the drives in the GUI, nor via Putty terminal. Not really sure what to do/try next. The drive with errors is typically one of the 4 that are still mounted!

  • received a couple of error notifications that 2 of the 6 drives hadn't been mounted (Status failed Service mountpoint_srv_dev-disk-by-label-WD4TBX is not a mountpoint).

    Are those the drives connected to the ODD and eSata? If they are it was one of the reasons why I only used the 4 bays for data use, but I have been investigating a short smart test error on the drive connected to the eSata. Because of that I have been looking at something like this whilst this is somewhat OTT they are not expensive and can be found on ebay.

    Do you any other drives you test in those ports? Disconnect all drives and boot with a spare drive connected to each of those ports in turn, or even boot with another live distro to test those ports. If your problem is related to the ODD and eSata it might be the cables.

  • I forgot to make a note of the drive serial numbers before I opened up the nas to check. Schoolboy error. Though, why I don’t know, but I disconnected all the drives, have the inners and connectors a blast with some compressed air, reseated everything, rebooted and the drives are back again. It’s odd, the server hasn’t been moved or bumped (to have caused any cables/drives to dislodge) and this had never happened before. Anyway, glad it all seems to be working ok now. Just the original issue to solve still. Thanks for your help!

  • So, the day after my last update I had a recurrence of 1 drive disappearing/disconnecting. I first received a notification that the 'flags had been changed' and then the 'mountpoint failed' message.


    This time, I checked which drive it was; the drive with the SMART error, and it was connected in slot 1 of the onboard caddies (not the ODD or eSATA). I went through the process of disconnecting and reconnecting all the drives, but this time I moved that drive to slot 2.


    Since rebooting 6 days ago I haven't had any recurrence of drives disconnecting, or the original issue of the server/union filesystem becoming unavailable. I hope I haven't jinxed anything, and I continue to touch wood. I won't profess to understanding enough about the hardware to even hazard a guess as to what could have been the issue, but it seems odd that the same drive in 2 different slots could have caused such an issue. The drive still shows the same number of impacted sectors (waiting for the replacement drive to arrive) but so far so good ...

  • I went through the process of disconnecting and reconnecting all the drives, but this time I moved that drive to slot 2.

    That's interesting, if I remember correctly slot1 in the Microservers are initially the boot drive, whilst you can update the bios with bios hack which one would expect to negate that slot1 I'm wondering if there is something hardware related to the raid backplate.


    The other thing I do for myself is I have an excel spreadsheet with all the drive information, size, serial no., make, model, and where they are connected, just saves trying to locate a faulty drive.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!