Monitoring restart -- Connection failed nginx

  • This is becoming annoyingly frequent; several times a day. I don't get it - up until about a week ago I'd never seen this occur, not once. I don't know why it started when it did.


    Since monit fails to access the web GUI via 127.0.0.1 (localhost), but then succeeds, I wonder if there's a timeout issue with the loopback interface of some kind. Or does nginx really keep crashing and getting restarted? I'm not sure how to tell.


    But a solution is needed...

  • Thanks, macom , but not really. I understand that Volker suspects "some issues" with nginx, also by pointing at the code snippet issuing the alerts, but I don't know how to debug the problem at the nginx side. Also, I don't think I've messed around with the nginx installation whatsoever. The only "messing around" I did, was upgrading from OMV4 to OMV5 instead of installing from scratch...

  • I went to google a bit on nginx and localhost issues in general... and came across this link.

    It suddenly occurred to me that I recently tweaked my network settings and enabled IPv6 in the process, and I'm fairly sure that's when nginx/monit started complaining to me.

    Will look into this further as soon as I have time.

  • I believe I found a solution. Well, at least a workaround.


    I edited the file /srv/salt/omv/deploy/monit/services/files/nginx.j2 containing:

    Code
    {%- if not webadmin_config.forcesslonly | to_bool %}
    if failed host 127.0.0.1 port {{ webadmin_config.port }} protocol http timeout 15 seconds for 2 times within 3 cycles then restart
    {% endif -%}
    {%- if webadmin_config.enablessl | to_bool %}
    if failed host 127.0.0.1 port {{ webadmin_config.sslport }} type tcpssl protocol http timeout 15 seconds for 2 times within 3 cycles then restart
    {% endif -%}


    and changed this part: timeout 15 seconds for 2 times within 3 cycles

    to this: timeout 25 seconds for 3 times within 4 cycles

    There are two occurrences, one for HTTP access to the GUI and one for HTTPS (SSL/TLS). I changed both to remain identical to each other.


    (You could choose higher numbers, but it would then take even longer for nginx to be restarted if it does hang. On the other hand, that would actually highlight a problem with nginx itself...)

    After that, I ran omv-salt deploy run nginx and let it do its magic. This will permanently update the file /etc/monit/conf.d/openmediavault-nginx.conf to reflect the new monit/nginx configuration.


    So far, so good... no nginx alert emails yet. Hope I didn't jinx it just now. :)

    Edited 2 times, last by cubemin: Clarification/reformatting ().

  • cubemin : Unfortunately this did not work either. Ran into an email wall again yesterday. Undoing the changes. :-(


    EDIT: Too quick, too early: Trying to "undo" the changes, I saw that they had been overwritten by the original values. I ran updates earlier this week. That's probably why. Changed it again and will continue testing...


    Sorry for the noise!

  • Thanks for the feedback - hmm, I have to see if my edit got reverted too. So far, though, no more nginx emails for me...

    On the plus side, the change is easy enough to make that it can be repeated after updates as needed.


    EDIT: My changes have not been reverted. Did you make sure to edit nginx.j2 and run omv salt deploy run nginx?

  • Most certainly. Did it again today. But I remember that the updates this week contained an update of OMV itself. Maybe that's why. I ran those manually (using apt-get) because of the source-list / teamviewer problems...

  • OK, so the jury's still out on whether the nginx fix works for you or not

    It does, but gets over written with an OMV update, I only get them occasionally, first a failed followed immediately by succeeded, but I tried your suggestion and it does work, but if there's an OMV update the change gets overwritten.

    Raid is not a backup! Would you go skydiving without a parachute?

  • It does, but gets over written with an OMV update, I only get them occasionally, first a failed followed immediately by succeeded, but I tried your suggestion and it does work, but if there's an OMV update the change gets overwritten.

    Gotcha. The change hasn't been overwritten on my system yet, although I could've sworn I've had OMV updates since then.

    But I'm glad it works - for me and others - so it will do until there's a permanent solution (I probably should submit this to Github or something)...

  • Gotcha. The change hasn't been overwritten on my system yet

    Interesting, I actually checked mine after the change and again from your #31 and noticed mine had reverted, but as my notifications are sporadic it's never bothered me.

    Raid is not a backup! Would you go skydiving without a parachute?

  • OK, I kept receiving those notifications, but I've finally managed to get rid of them.


    The short version: votdev was right. It was a configuration problem.


    The longer version;

    - I've started tracking when the connection failures started to occur and realized they in fact did occur regularly, every second Friday at pretty much the same time in the middle of the night.

    - I then started checking the logs and realized that immediately prior to the issue occuring, I had rsync jobs failing.

    - Looking into them, I quickly realized what was wrong: I exchanged my server some months ago and demoted the former server to a backup medium. At the same time I changed the old server IP and assigned the old server IP to the new server. I did not change the rsync job though.

    - In effect, my new OMV server tried to rsync to itself, with login information for a user that did not exist and while no rsync server was running (that just runs on the old server).

    - I've corrected the configuration and restored the standard monit configuration more than three weeks ago.

    - No problems since and given the above, I'd be surprised they'd return.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!