Raspberry Pi 4 Watchdog Not Working?

  • I am running my RPI4 4GB OMV 5, installed per the setup guide.


    I am running into random crashes, mostly when I am using portainer to start/stop a container stack.
    I was going to verify that the watchdog timer is running, so that the timer could reboot the system after a hang (which results in total non-responsiveness, not even a ping can elicit a response)


    Below is what is in the watchdog config:
    /etc/watchdog.conf
    # This file is auto-generated by openmediavault (https://www.openmediavault.org)
    # WARNING: Do not edit this file, your changes will get lost.
    watchdog-device = /dev/watchdog
    # This greatly decreases the chance that watchdog won't be scheduled before
    # your machine is really loaded
    realtime = yes
    priority = 1


    The watchdog is not reacting to the hang, and I am forced to manually power cycle the unit to recover.


    What does 'realtime = yes' mean, and is there a way to configure the watchdog (because it says OMV supercede any changes.)



    -------------------------------------------------------------------------------------------------------------------------------------------
    EDIT:
    It seems my watchdog has issues, which is the problem.


    Attempted to start the watchdog via ssh, here is the output:


    Job for watchdog.service failed because the control process exited with error code.
    See "systemctl status watchdog.service" and "journalctl -xe" for details.


    This is the output of 'systemctl status watchdog.service'


    This is the output of 'journalctl -xe'

  • I've been researching this all day.
    Apparently the fault is due to Raspian not shipping with the 'softdog' module, which is something debian linux ships with.


    I found several threads that declared that "this is not OMV's problem and therefore it should be brought up with the people that develop Raspian. OMV devs will not dedicate time to fix it." were essentially all the answers I found.


    SoftDog is a software watchdog. The Raspberry Pi has a hardware watchdog that can very easily be enabled, so I can understand why softdog was never implemented. Enabling the watchdog provides the reboot on system hang I was hoping for.


    Solution:
    Edit the file /etc/systemd/system.conf and set the following options:


    RuntimeWatchdogSec=10 (Max value of 15 here. I set mine to 14)
    ShutdownWatchdogSec=10min


    Then I verified with a forkbomb test
    : (){ :|:& };:


    After a few seconds, I saw the desired effect. I had two command prompts running, SSH and another command prompt open with a persisnt ping test. Pings never stopped being responded to, but the SSH session was kicked. After a few seconds I tried to re-SSH into the device and it worked like a charm. (Previously, all my attempts at enabling the watchdog required a hard power-cycle by physically unplugging the pi4 after the fork-bomb test. During the test, the pings were functional (just like they are with the watchdog active) but ssh was always refused due to timeout. The fact that I can now SSH after a forkbomb seems to indicate this is working fine.)

    Einmal editiert, zuletzt von RFBomb () aus folgendem Grund: Additional Details

    • Offizieller Beitrag

    I found several threads that declared that "this is not OMV's problem and therefore it should be brought up with the people that develop Raspian. OMV devs will not dedicate time to fix it." were essentially all the answers I found.

    Maybe you should submit a pull request with your change to this file - https://github.com/openmediava…ploy/watchdog/default.sls

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!


  • I wasn't trying to be snarky, but was just summarizing the various responses I found. That said, Thank you for linking to that file. I kept seeing the 'auto-generated by OpenMediaVault' header, but was never able to find out where it actually keeps track of that and forces the changes out. (I also don't know the interval, so I'm not sure if or when my changes would be wiped out).


    I've been continuing to experiment with it though, and found that though my original reply using SystemD was mostly functional, it wasn't a full reboot. It got the OS back, but would not restart OMV, or Portainer.


    The following changes are what allowed me to actually work with watchdog as expected, restoring the full system after a hang.


    Disable the changes to SystemD I noted in my original post -- they cause systemd to use /dev/watchdog and prevent the watchdog daemon from accessing it. (Though I supposed you could set the daemon to use watchdog0 if you wanted to)


    Additional lines to /etc/watchdog.conf
    watchdog-timeout=14
    max-load-1 = 24


    Changes to /etc/default/watchdog
    watchdog_options="softboot" -- This calls for a full reboot of the system. Same as if power was cycled (without actually cycling power).


    watchdog_module="bcm2835_wdt" -- Changed from 'softdog' to look at the hardware device on the RPI.





    I will try to submit that change over on the github. Once again, thanks for linking me to it, I was scratching my head over if my changes would actually be erased or if it was just a warning.



    Edit: I submitted it as an 'Issue' (I don't know how to perform a 'pull request'. And when I tried it it was for a whole branch of changes and not the single one I'm recommending.)

    Einmal editiert, zuletzt von RFBomb () aus folgendem Grund: Additonal notes

    • Offizieller Beitrag

    If someone wants to use a different watchdog module simply do the following:


    Add OMV_WATCHDOG_WATCHDOGMODULE="foobar" to /etc/default/openmediavault and execute omv-salt deploy run watchdog.


    See https://github.com/openmediava…/watchdog/default.sls#L25 for more environment variables.


    P.S.: This is the OMV way to customize and override defaults that are under control of OMV.

  • That is great, I wish I knew about that.
    But unfortunately, it still does not fully resolve the issue for the Raspberry Pi.


    The WatchDog Daemon runs a default timer of 60s if the conf file does not specify otherwise. The Pi's hardware has a max limit of 15s watchdog timer.


    I propose adding a new parameter to the base salt file to have a custom timer.


    {% set watchdog_timeout = salt['pillar.get']('default:OMV_WATCHDOG_WATCHDOGTIMER', '60') %}


    then under 'realtime' option, specify:
    watchdog-timeout= {{ watchdog_timeout }}


    Then, a user would be able to customize their watchdog timer for their hardware with ease, using the method you suggest:
    Add / Modify the lines in /etc/default/openmediavault
    OMV_WATCHDOG_WATCHDOGMODULE="bcm2835_wdt"
    OMV_WATCHDOG_WATCHDOGTIMER="14"
    then execute omv-salt deploy run watchdog


    Without the timer setting, the watchdog deamon will still fail to start on the pi.
    Unfortunately, after some additional testing I am noticing that the watchdog daemon isn't starting for some reason on my device unless i tell it to. My workaround is going to be running a cron-job to start it once OMV finishes booting.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!