"Communications with the UPS ups_localhost are lost" issue for current plugin: temporary workaround or final solution? And remote monitoring?

  • Hi wonderful people,
    I recently upgraded my UPS to a Cyberpower CP1500PFCLCD, one of the most popular UPS this year as it is affordable and sine wave waveform (won't debate here about cons/pros of sine wave).
    I use it only for my Openmediavault baby, so I connected it directly via USB. It works, I am happy-ish.
    I premise I am sharing this for both
    a) helping temporarily those who like me are going banana and NUTs (pun intended) with this issue (see below) and
    b) to start a thread that may hopefully bring a more stable/persistent solution


    So.. I installed the Openmediavault official UPS NUT plugin, configured it good as per various threads here (see settings picture below) and everything seems to be working fine... except...
    ISSUES:
    - I cannot either monitor the status remotely


    - The plugin simply fails to shut down the server on loss of power (I explain below) making it moot to have it running.


    The server looses connection randomly for no apparent reason, consistently in a range between 5 to 75 minutes after I manually stop/restart the service, because of the dreaded (and apparently well-known) "Data stale" issue.
    - DRAMA: I get notifications of this every 5 minutes via email, so if I am not at home to restart the service manually, I get self spammed HUNDREDS of email notifications: I need/want all the other notifications, and mostly I can choose, but i cannot disable the ones for UPS (not in the list yet (hint, hint)).
    - Of course, with the UPS disconnected, there is no point running the plugin or the monitoring at all as the server won't get nor issue the "graceful shutdown" command.
    I perused everywhere, finding solutions that are outdated for OMV2 and OMV3 that no longer apply, and I attempted every trick in the book (e.g. changing the upsd.conf values for maxstartdelay, which it will ignore and maxretry, which is also ignored) but nada...
    - Monitoring: no software for windows or linux are able to establish connection to the OMV server from local network on its local address and the port 3493 (or any other port). I checked with open FW port and no FW at all, TCP and UDP.


    TEMPORARY SOLUTION:
    I finally had enough NUTs so I decided to create a shell script and then add it on cron (on reboot) that will run continuously and check on the UPS link status, and if disconnected will restart the service: this way I will still get an email notification every time it disconnects and reconnects, but not every five minutes for hours. This way I will get 20/30 notices, not 300.
    I also noticed that since I started this script, I am getting almost no failure notices, possibly because of the harassing nature of the scripts itself.
    Basically this script, launched on reboot, will run in the background and checks every 10 seconds if the UPS link is up using "upsc xyz" (upsc comes pre-installed with OMV) command status.
    This method actually provides a status value of 0 (if link up) / 1 (if link down) with echo $?, hence the idea of using it to trigger a driver/daemon restart using the "upsdrvctl start".
    To launch it silent without output I use a second shell script, which is also what I use in the OMV task scheduler to run it on reboot.
    I created two shell script files in root home:
    - touch /root/checkups.sh
    - touch /root/upsfix.sh
    made them executable by root user and group only.


    _______________________________________________

    Code: upsfix.sh
    #!/usr/bin/bash
    # To launch it without output
    /root/checkups.sh > /dev/null 2>&1


    Then in the OMV control panel for the System \ Scheduled Jobs (cron), I created a task,
    - At reboot, as Root, execute "sleep 300 && /root/upsfix.sh"


    One can change the check loop interval from 10 seconds to whatever. Though IMHO I wouldn't go any lower than 5 or any higher than 300 (5 minutes).
    5 minutes (sleep 300) is also what I chose the launcher to wait after reboot to start the script, just to make sure everything is loaded and up and running before launching this.


    SO: with this I fixed temporarily the SPAM and now everything seems to be working fine: I pulled the power to the UPS to test, while it was reporting UPS link disconnected, and 10 seconds later it reconnected and the plugin issued the shutdown.


    Question to you wonderful people: can we do any better than this workaround?


    Please feel free to kill, mock and denigrate my code skills, I'd actually really love if you can do better and show me how.
    It is always a good time to learn new things !


    You guys have a wonderful week !

  • I think adding if failed host 127.0.0.1 port 3493 type TCP then restart to /etc/monit/conf.d/openmediavault-nut.conf will do it. Can you check that?



    Code
    check process nut-server with matching upsd
    group nut
    start program = "/bin/systemctl start nut-server"
    stop program = "/bin/systemctl stop nut-server"
    mode active
    if failed host 127.0.0.1 port 3493 type TCP then restart

    After that you need to run systemctl restart monit.

  • I think adding if failed host 127.0.0.1 port 3493 type TCP then restart to /etc/monit/conf.d/openmediavault-nut.conf will do it. Can you check that?



    Code
    check process nut-server with matching upsd
    group nut
    start program = "/bin/systemctl start nut-server"
    stop program = "/bin/systemctl stop nut-server"
    mode active
    if failed host 127.0.0.1 port 3493 type TCP then restart

    After that you need to run systemctl restart monit.

    Hi Volker,
    I am back from training today.
    I updated OMV-update and got openmediavault-nut 5.0.3 today.
    Then I saw the comment above and and added the line.
    I restarted "systemctl restart monit" as directed.


    Now time will tell: even with my workaround checking every 5 seconds, I still got random stops/recover starts every 3/4 hours or so, so only 6/7 notifications extra per day instead of 300, which is good.
    That is also after updating to openmediavault-nut 5.0.2: by the way, very elegant leverage of monit: I'd done that after your suggesting, if you had not already right away in 5.0.2 :) .
    I noticed now that after openmediavault-nut 5.0.2/3 updates I am occasionally getting a different error message.


    Quick question: should I stop my workaround script to see if your fix worked, or should I leave them working in tandem (together)? I think right now it is a tad overkill running with your 5.0.3 fix + the code fix above + my workaround script all at the same time, which should I stop, if any?


    PS: thank you so much for taking care of this so quickly and with official updates: I am really honored.
    Thanks again!

  • I am having a similar issue. Not sure if it is the same problem. I am getting notifications regularly regarding connection to UPS is lost and then the connection is regained. It keeps looping through on an ongoing basis. UPS functions fine by shutting down system at low battery.


    Here is sample nut log content (it keeps looping through this):

    Feb 1 15:57:46 -NAS1 upsmon[1006]: Poll UPS [ups] failed - Data stale

    Feb 1 15:57:47 -NAS1 upsd[1003]: UPS [ups] data is no longer stale

    Feb 1 15:57:47 -NAS1 upsd[1003]: Can't connect to UPS [ups] (usbhid-ups-ups): No such file or directory

    Feb 1 15:57:51 -NAS1 upsmon[1006]: Poll UPS [ups] failed - Driver not connected

    Feb 1 15:57:53 -NAS1 upsd[1003]: Connected to UPS [ups]: usbhid-ups-ups

    Feb 1 15:57:56 -NAS1 upsmon[1006]: Communications with UPS ups established

    Feb 1 15:57:56 -NAS1 upssched[16234]: Executing command: notify

    Feb 1 15:57:56 -NAS1 upssched-cmd: Communications with the UPS ups are established

    Feb 1 19:09:30 -NAS1 upsd[1003]: Data for UPS [ups] is stale - check driver

    Feb 1 19:09:32 -NAS1 upsmon[1006]: Poll UPS [ups] failed - Data stale

    Feb 1 19:09:32 -NAS1 upsmon[1006]: Communications with UPS ups lost

    Feb 1 19:09:32 -NAS1 upssched[4867]: Executing command: notify

    Feb 1 19:09:32 -NAS1 upssched-cmd: Communications with the UPS ups are lost

    Feb 1 19:09:37 -NAS1 upsmon[1006]: Poll UPS [ups] failed - Data stale


    Here is sample syslog content (it keeps looping through this):

    Feb 1 19:10:32 -NAS1 upsmon[1006]: Poll UPS [ups] failed - Data stale

    Feb 1 19:10:32 -NAS1 monit[957]: 'nut-upsc-ups' status failed (1) -- Init SSL without certificate database#012Error: Data stale

    Feb 1 19:10:32 -NAS1 postfix/smtpd[4951]: connect from localhost.localdomain[127.0.0.1]

    Feb 1 19:10:32 -NAS1 postfix/smtpd[4951]: E966526B: client=localhost.localdomain[127.0.0.1]

    Feb 1 19:10:32 -NAS1 postfix/cleanup[4881]: E966526B: message-id=<1612224632.cc2f46d6d9238e8b@\-NAS1.Local1>

    Feb 1 19:10:32 -NAS1 postfix/qmgr[1052]: E966526B: from=<[email=''][/email]>, size=1088, nrcpt=2 (queue active)

    Feb 1 19:10:32 -NAS1 postfix/smtpd[4951]: disconnect from localhost.localdomain[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5

    Feb 1 19:10:32 -NAS1 monit[957]: 'nut-upsc-ups' trying to restart

    Feb 1 19:10:32 -NAS1 monit[957]: 'nut-upsc-ups' start: '/usr/sbin/upsdrvctl start'

    Feb 1 19:10:32 -NAS1 upsd[1003]: UPS [ups] data is no longer stale

    Feb 1 19:10:32 -NAS1 usbhid-ups[16220]: Signal 15: exiting

    Feb 1 19:10:32 -NAS1 upsd[1003]: Can't connect to UPS [ups] (usbhid-ups-ups): No such file or directory

    Feb 1 19:10:32 -NAS1 postfix/pipe[4883]: E966526B: to=<openmediavault-notification@localhost.localdomain>, relay=omvnotificationfilter, delay=0.02, delays=0/0/0/0.02, dsn=2.0.0, status=sent (delivered via omvnotificationfilter service)

    Feb 1 19:10:34 -NAS1 postfix/smtp[4884]: E966526B: replace: header Subject: Monitoring restart -- Status failed nut-upsc-ups: Subject: [-NAS1.Local1] Monitoring restart -- Status failed nut-upsc-ups

    Feb 1 19:10:35 -NAS1 postfix/smtp[4884]: E966526B: to=<>, relay=smtp..com[]:, delay=2.7, delays=0/0/1.7/1, dsn=2.0.0, status=sent (250 OK , completed)

    Feb 1 19:10:35 -NAS1 postfix/qmgr[1052]: E966526B: removed

    Feb 1 19:10:37 -NAS1 upsmon[1006]: Poll UPS [ups] failed - Driver not connected

    Feb 1 19:10:38 -NAS1 collectd[1062]: nut plugin: Connection to (localhost, 3493) established.

    Feb 1 19:10:38 -NAS1 collectd[1062]: nut plugin: Connection is unsecured (no SSL).

    Feb 1 19:10:38 -NAS1 collectd[1062]: nut plugin: nut_read: upscli_list_start (ups) failed: Driver not connected

    Feb 1 19:10:38 -NAS1 collectd[1062]: read-function of plugin `nut/ups@localhost:3493' failed. Will suspend it for 80.000 seconds.

    Feb 1 19:10:38 -NAS1 usbhid-ups[4965]: Startup successful

    Feb 1 19:10:38 -NAS1 upsd[1003]: Connected to UPS [ups]: usbhid-ups-ups

    Feb 1 19:10:42 -NAS1 upsmon[1006]: Communications with UPS ups established

    Feb 1 19:10:42 -NAS1 upssched[4979]: Executing command: notify


    What is making it stop and restart? Thanks for the help!


    OMV5 - 5.5.23-1 (Usul)

    Kernel - Linux 5.9.0-0.bpo.5-amd64

    omv-nut 5.1.1-1 plugin

    UPS - CyberPower 825VA-AVR

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!