OMV Nightly Hang

  • Hi Everyone,


    I recently built a new server for OMV and have been experiencing Nightly hangs. The server is unresponsive and needs a manual reboot in order to come back up. I've swapped out several components and performed a fresh install of OMV every time. The hangs don't seem to happen if I reboot the server in the afternoon/evening. For example. I rebooted the server yesterday around 19:30 last night and didn't see the behavior. My server setup is as follows:


    CPU: AMD Ryzen 5 1600
    Motherboard: B450I Gaming Plus AC (also tried using a ASRock Mobo)
    Memory: G.SKILL FORTIS Series 16GB (8GB x 2) (also tried Crucial and Patriot memory)
    SSD Drive: ADATA SU800 M.2 2280
    SATA Drives: Hitachi HDS72303 x 4
    USB Drive: WD My Passport 2599


    Last time the hang happened I saw the following logs:


    daemon.log

    Code
    Apr 10 06:26:13 booknas systemd[1]: Starting Daily apt upgrade and clean activities...
    Apr 10 06:26:14 booknas systemd[1]: Started Daily apt upgrade and clean activities.
    Apr 10 06:26:14 booknas systemd[1]: apt-daily-upgrade.timer: Adding 22min 22.439752s random time.
    Apr 10 06:26:14 booknas systemd[1]: apt-daily-upgrade.timer: Adding 40min 16.960270s random time.


    debug

    Code
    Apr 10 05:21:26 booknas rrdcached[1118]: started new journal /var/lib/rrdcached/journal/rrd.journal.1554891686.354734
    Apr 10 05:21:26 booknas rrdcached[1118]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1554884486.354732
    Apr 10 06:21:26 booknas rrdcached[1118]: flushing old values
    Apr 10 06:21:26 booknas rrdcached[1118]: rotating journals
    Apr 10 06:21:26 booknas rrdcached[1118]: started new journal /var/lib/rrdcached/journal/rrd.journal.1554895286.354742
    Apr 10 06:21:26 booknas rrdcached[1118]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1554888086.354727

    syslog

    Code
    Apr 10 06:21:26 booknas rrdcached[1118]: flushing old values
    Apr 10 06:21:26 booknas rrdcached[1118]: rotating journals
    Apr 10 06:21:26 booknas rrdcached[1118]: started new journal /var/lib/rrdcached/journal/rrd.journal.1554895286.354742
    Apr 10 06:21:26 booknas rrdcached[1118]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1554888086.354727
    Apr 10 06:25:01 booknas CRON[23598]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
    Apr 10 06:26:13 booknas systemd[1]: Starting Daily apt upgrade and clean activities...
    Apr 10 06:26:14 booknas systemd[1]: Started Daily apt upgrade and clean activities.
    Apr 10 06:26:14 booknas systemd[1]: apt-daily-upgrade.timer: Adding 22min 22.439752s random time.
    Apr 10 06:26:14 booknas systemd[1]: apt-daily-upgrade.timer: Adding 40min 16.960270s random time.
    Apr 10 06:30:01 booknas CRON[23795]: (root) CMD (/usr/sbin/omv-mkrrdgraph >/dev/null 2>&1)


    Does anyone have any ideas on where else to look or something else to try?


    Thanks!

    • Offizieller Beitrag

    Fully updated OMV 4?
    Did you try the proxmox kernel that you can install from the Kernel tab of omv-extras?
    Did you run memtest to make sure the memory is ok?

    omv 7.0-32 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.9 | compose 7.0.9 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • I am running 4.1.21 which I believe is the latest version.


    I haven't ran memtest on this latest memory, but I'll do that to double check.


    I also haven't tried the proxmox kernel, what's different about that kernel?

    • Offizieller Beitrag

    what's different about that kernel?

    It is Ubuntu 18.04's 4.15 LTS kernel. Seems a bit more stable than the backports kernel in Debian Stretch.

    omv 7.0-32 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.9 | compose 7.0.9 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Fully updated OMV 4?
    Did you try the proxmox kernel that you can install from the Kernel tab of omv-extras?
    Did you run memtest to make sure the memory is ok?

    I ran memtest, switched to the proxmox kernel, and finally was able done making changes. Sadly, the issue reoccured. I did have the terminal up and got the following error:


    Code
    Message from syslogd@booknas at Apr 16 02:59:28 ...
    
    
     kernel:[209987.536806] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u64:0:23400]

    Would this be due to a bad CPU or unrelated?

    • Offizieller Beitrag

    Would this be due to a bad CPU or unrelated?

    It seems to be a hardware stability problem. Is the cpu overclocked? Have you updated to the latest bios? Maybe the power supply is having issues under high load? Hard to say.

    omv 7.0-32 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.9 | compose 7.0.9 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • It seems to be a hardware stability problem. Is the cpu overclocked? Have you updated to the latest bios? Maybe the power supply is having issues under high load? Hard to say.

    Turns out it was the power supply! I wasn't overclocking and I don't think it had to deal with a high load. I hooked up a kill-a-watt to the machine and the max I saw it pulling was 85 watts and the supply is rated for 450 watts. I ended up buying a higher end power supply and the problem went away. It's strange that the hanging happened with a specific set of criteria. I would think a power supply issue would be random.


    Either way, I hope this thread helps people out of they see the same thing in the future.


    Thanks for your help!

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!