omv irregular reboots

  • Hallo,


    i use OMV5 (on a i5-4590) and have irregular reboots, if I transfer larger amounts of data (i.e. over a longer period of time).

    It doesn't matter how I transfer the data (directly an the machine, via a backup service, or SMB).
    The reboots do not occur every time, most often after a few hours. In my current case, a large initial backup was started via BorgBackup at about 4:00am, and between 8:00am and 8:16am there was a reboot:


    The restarts are a big burden for me. Especially that I also use LUKS Encryption and my disks are not available without manual decryption after the restart.

    However, I can't find or isolate the cause. :(

    I have attached the syslog, but I can't find anything remarkable in it.
    syslog_reboot.txt


    can someone help me?

  • geaves

    Hat das Thema freigeschaltet.
  • I have attached the syslog, but I can't find anything remarkable in it.

    syslog_reboot.txt


    can someone help me?

    The log is for the timeframe 6 AM to 8 AM (2 hours since logrotate) of January 28th. Does it even contain the timeframe in which a reboot occurred? If not please attach /var/log/syslog.1 and also provide the information in which timeframe a reboot occurred.


    Typical reasons could be the watchdog service (if the watchdog things you system is stalled) or kernel panics or cron jobs.

  • Ah, yes. Watchdog and kernel panic would both leave a hint in the syslog. You should check the other log files as well: /varlog/messages, /var/log/daemon.log, /var/log/kern.log, /var/log/auth.log, etc for this specific timeframe. Are messages to root being forwarded?

  • .... Are messages to root being forwarded?

    hmmm, i can't quite follow you. can you explain in more detail?



    I think the reboot was at 8:09am (according to syslog).

    But it's strange that Netdata (Docker) didn't save any data already at 08:03:58...:/

    Code: syslog
    Jan 28 08:09:00 openmediavault systemd[1]: Starting Clean php session files...
    Jan 28 08:09:00 openmediavault systemd[1]: phpsessionclean.service: Succeeded.
    Jan 28 08:09:00 openmediavault systemd[1]: Started Clean php session files.
    Jan 28 08:09:01 openmediavault CRON[3451]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
    Jan 28 08:15:45 openmediavault systemd-modules-load[286]: Inserted module 'coretemp'
    Jan 28 08:15:45 openmediavault blkmapd[304]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
    Jan 28 08:15:45 openmediavault systemd[1]: Started Create Static Device Nodes in /dev.
    Jan 28 08:15:45 openmediavault systemd[1]: Starting udev Kernel Device Manager...



    i checked all the Logs in /var/log/, but not found anything around the time.:|

    Code: /varlog/messages
    Jan 28 07:22:07 openmediavault kernel: [26117.068905] usb 2-14.5: SerialNumber: DE2251222
    Jan 28 07:22:07 openmediavault kernel: [26117.076635] cdc_acm 2-14.5:1.0: ttyACM0: USB ACM device
    Jan 28 08:15:45 openmediavault kernel: [    0.000000] Linux version 5.9.0-0.bpo.5-amd64 (debian-kernel@lists.debian.org) (gcc-8 (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP Debian 5.9.15-1~bpo10+1 (2020-12-31)
    Jan 28 08:15:45 openmediavault kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.9.0-0.bpo.5-amd64 root=UUID=ad0e5196-e518-4c68-8ec2-77c3d964c6fa ro rootflags=subvol=@ quiet
    Code: /var/log/kern.log
    Jan 28 07:22:07 openmediavault kernel: [26117.076635] cdc_acm 2-14.5:1.0: ttyACM0: USB ACM device
    Jan 28 08:15:45 openmediavault kernel: [    0.000000] Linux version 5.9.0-0.bpo.5-amd64 (debian-kernel@lists.debian.org) (gcc-8 (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP Debian 5.9.15-1~bpo10+1 (2020-12-31)
    Jan 28 08:15:45 openmediavault kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.9.0-0.bpo.5-amd64 root=UUID=ad0e5196-e518-4c68-8ec2-77c3d964c6fa ro rootflags=subvol=@ quiet
  • So today (tonight) borgbackup starts his job again and performed the backup.

    This time there was no reboot, but at 7:30am the transmission stopped and the CPU iowat going near 100% (cpu0+cpu1).



    I checked the waiting processes in state, to finde the reason for the high iowat:

    (unsurprisingly, it was the backup process (or the disk to be backed up?)that got a problem)



    so i checked the syslog, and have found a kernel bug:

    Jan 29 07:30:48 openmediavault kernel: [43413.087560] BUG: kernel NULL pointer dereference, address: 0000000000000f30

    *see the attachments syslog.txt


    any idea what could have been the reasons for this?

    And whether this is related to the restart the day before?

  • hmmm, i can't quite follow you. can you explain in more detail?

    Sure. Daemons which might restart the system tend to send a message to root to advertise their actions. So what I'm asking was if you get those emails, i.e. if you have configured the system to formward these emails?

  • So today (tonight) borgbackup starts his job again and performed the backup.

    This time there was no reboot, but at 7:30am the transmission stopped and the CPU iowat going near 100% (cpu0+cpu1).

    disk.png cpu.png

    http://veithen.io/2013/11/18/iowait-linux.html


    Well, do you use a watchdog service? (systemctl status watchdog) If aynthing happens that seems like something is stalled it might provoke the watchdog service to restart the system. Can you check which verbosity level the watchdog service has (/etc/watchdog.conf)?


    Hm. The trace shows that there is a checksum issue with the btrfs file system. That might point to a problem with a memory or disk corruption. I'd check the file system and the smart status. PS: I have also seen spontaneous restarts when there was an issue with the RAM. In these cases also no trace was left behind in the logs. One can install memtest86 in Debian and test the RAM too.

  • Hm. With docker I am not firm. But the reason why I asked if you have KVM, and maybe the behaviour is similar in docker, is that I also had random reboots. Machine was up even 2 or 3 days, and suddenly rebooted, as if power would have gone unstable. No logs whatsoever, as if power supply failed, or any other power supplying component failing, causing a cold restart.


    However, the problem was that I did not create a "bridge" device in the OMV-GUI for the assigned network card. As soon as this device has been created with the NIC I intended to assign to my virtual machines, and as soon as it has subsequently been properly referred to "br0" in my Virtual Machine Manager for my virtual machines, all problems resolved, no sporadic unexplainable crashes at all.


    If you have IPv6 activated, maybe try with IPv4 only. Also this caused troubles for me sometimes.


    Maybe verify this is not causing your issues.

  • Hi

    big hank you for your answers and help :)


    Sure. Daemons which might restart the system tend to send a message to root to advertise their actions. So what I'm asking was if you get those emails, i.e. if you have configured the system to formward these emails?

    now I have understood :)
    I have, and on the last Reboot (checksum issue with the btrfs file system) i get an email the Drive is corrupted. But with the other reboots I never got such an email.


    http://veithen.io/2013/11/18/iowait-linux.html


    Well, do you use a watchdog service? (systemctl status watchdog) If aynthing happens that seems like something is stalled it might provoke the watchdog service to restart the system. Can you check which verbosity level the watchdog service has (/etc/watchdog.conf)?


    Hm. The trace shows that there is a checksum issue with the btrfs file system. That might point to a problem with a memory or disk corruption. I'd check the file system and the smart status. PS: I have also seen spontaneous restarts when there was an issue with the RAM. In these cases also no trace was left behind in the logs. One can install memtest86 in Debian and test the RAM too.

    I haven't had any reboots lately (system has been running for 11 days). However, the nightly backups are also much smaller than the initial backup. So I think the watchdog service ist ok.

    I have two BTRFS disks in my system, and the crashes have already occurred on (big) read/write tasks of both disks.

    That both discs have a damage I think is unrealistic (I hope so resp.) the smart status ar also good of both.



    I also think that the RAM will be the problem. I use two "old" 8GB modules, bought in 2013/2014. It's possible that they have reached the end of their life. I will try to test it ;)

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!