OMV randomly unreachable

  • Hi guys, I'm facing a very complex problem with my new OMV build: after a random time since last boot (1 to 48 h), the server becomes unreachable (no http, ssh, ping etc).

    I'm running OMV 5.x, with some docker container (traefik, homeassistant, nextcloud, fail2ban, mysql and some other).

    My server specs are: ryzen 1700x with asrock x370 pro 4 mobo, 8 gb ram, nvme 240 gb as OS disk, some spare drives (both usb and sata) as storage disk (no raid configured, only snapraid).

    Note that no gpu is installed, so I can't access to console when the server is unreachable! Other specs are oversized, cpu has 2 % of average load and ram usage is always around 4 gb, with 0% of 8gb swap usage.

    I tried to enable kernel log and journalctl log persistance, but I can't find any meaningfull info in my /var/log files (seems that syslog, journalctl and kern.log stop writing when crash happens, but no line is truncated)

    After uninstall flash memory plugin (2 days ago), the problem frequency seems to be increased (4 crash / last 48h), but maybe it's a coincidence.

    Any tips?

  • A small update: 5 days of uptime and still running...

    The problem seems to be a kernel bug (both 4.x and 5.x versions are affected) with ryzen 1st gen. Not all c-states are supported, so switching off c6 and locking cpu to c-1, c-2 and c-3 may solve the freezes.

    So 1st gen ryzen and random freezes? Try (Google helps):

    1) enable kernel logging and journalctl persistent logging

    2) using journalctl, search for cpu soft lock stacktraces just before your server reboot

    3) if match found, disable c6 state from motherboard bios or with script ( github )

    Just my little suggestion :)

  • Not sure if it's related or not. I'm running OMV on a rock64 SBC with 4gb ram.

    I started noticing the issue today. For the short amount of time my board is online I noticed that portainer kept stopping. I quickly uninstalled it and then checked docker ps and docker stats both of which showing no containers running. Uninstalled docker and haven't had OMV be unreachable yet.

    Will report back any findings.

