I am running my RPI4 4GB OMV 5, installed per the setup guide.
I am running into random crashes, mostly when I am using portainer to start/stop a container stack.
I was going to verify that the watchdog timer is running, so that the timer could reboot the system after a hang (which results in total non-responsiveness, not even a ping can elicit a response)
Below is what is in the watchdog config:
/etc/watchdog.conf
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.
watchdog-device = /dev/watchdog
# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime = yes
priority = 1
The watchdog is not reacting to the hang, and I am forced to manually power cycle the unit to recover.
What does 'realtime = yes' mean, and is there a way to configure the watchdog (because it says OMV supercede any changes.)
-------------------------------------------------------------------------------------------------------------------------------------------
EDIT:
It seems my watchdog has issues, which is the problem.
Attempted to start the watchdog via ssh, here is the output:
Job for watchdog.service failed because the control process exited with error code.
See "systemctl status watchdog.service" and "journalctl -xe" for details.
This is the output of 'systemctl status watchdog.service'
watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2019-12-22 13:09:49 GMT; 29s ago
Process: 7656 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] [ "${watchdog_module}" = "none" ] /sbin/modpro
Process: 7658 ExecStopPost=/bin/sh -c [ $run_wd_keepalive != 1 ] false (code=exited, status=0/SUCCESS)
Dec 22 13:09:49 raspberrypi systemd[1]: Starting watchdog daemon...
Dec 22 13:09:49 raspberrypi sh[7656]: modprobe: FATAL: Module softdog not found in directory /lib/modules/4.19.75-v7l+
Dec 22 13:09:49 raspberrypi systemd[1]: watchdog.service: Control process exited, code=exited, status=1/FAILURE
Dec 22 13:09:49 raspberrypi systemd[1]: watchdog.service: Failed with result 'exit-code'.
Dec 22 13:09:49 raspberrypi systemd[1]: Failed to start watchdog daemon.
Dec 22 13:09:49 raspberrypi systemd[1]: watchdog.service: Triggering OnFailure= dependencies.
This is the output of 'journalctl -xe'
--
-- A start job for unit wd_keepalive.service has begun execution.
--
-- The job identifier is 1507.
Dec 22 13:07:25 raspberrypi sh[6662]: modprobe: FATAL: Module softdog not found in directory /lib/modules/4.19.75-v7l+
Dec 22 13:07:25 raspberrypi systemd[1]: wd_keepalive.service: Control process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStartPre= process belonging to unit wd_keepalive.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Dec 22 13:07:25 raspberrypi systemd[1]: wd_keepalive.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit wd_keepalive.service has entered the 'failed' state with result 'exit-code'.
Dec 22 13:07:25 raspberrypi systemd[1]: Failed to start watchdog keepalive daemon.
-- Subject: A start job for unit wd_keepalive.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit wd_keepalive.service has finished with a failure.
--
-- The job identifier is 1507 and the job result is failed.
Dec 22 13:07:27 raspberrypi monit[1483]: 'filesystem_srv_dev-disk-by-id-usb-SABRENT_SABRENT_DB9876543214E-0-0-part1' spa
Dec 22 13:07:57 raspberrypi monit[1483]: 'filesystem_srv_dev-disk-by-id-usb-SABRENT_SABRENT_DB9876543214E-0-0-part1' spa