Monitoring restart -- Connection failed nginx

cubemin · 15. März 2021

This is becoming annoyingly frequent; several times a day. I don't get it - up until about a week ago I'd never seen this occur, not once. I don't know why it started when it did.

Since monit fails to access the web GUI via 127.0.0.1 (localhost), but then succeeds, I wonder if there's a timeout issue with the loopback interface of some kind. Or does nginx really keep crashing and getting restarted? I'm not sure how to tell.

But a solution is needed...

macom · 15. März 2021

This might give a hint:

I'm receiving constant emails with Connection failed nginx followed by Connection succeeded nginx some 30 secs latter

m4tt0 · 15. März 2021

Thanks, macom , but not really. I understand that Volker suspects "some issues" with nginx, also by pointing at the code snippet issuing the alerts, but I don't know how to debug the problem at the nginx side. Also, I don't think I've messed around with the nginx installation whatsoever. The only "messing around" I did, was upgrading from OMV4 to OMV5 instead of installing from scratch...

cubemin · 16. März 2021

I went to google a bit on nginx and localhost issues in general... and came across this link.

It suddenly occurred to me that I recently tweaked my network settings and enabled IPv6 in the process, and I'm fairly sure that's when nginx/monit started complaining to me.

Will look into this further as soon as I have time.

cubemin · 20. März 2021

Today I set out to permanently disable IPv6 on my OMV system in the (vague) hopes that it will put an end to the nginx/monit failure notifications.

See this link which should do the trick. Time will tell if I'm successful...

sonofwatt · 25. März 2021

Please post back if that fixes it for you.
In my case I don't have IPv6 enabled on my router/gateway, but this is worth a try on the server if it works for you.

cubemin · 25. März 2021

Sad to say it didn't. I was wrong. IPv6 is not the cause, it seems.

Can we get someone like votdev or ryecoaaron to chime in here?

cubemin · 11. April 2021

I believe I found a solution. Well, at least a workaround.

I edited the file /srv/salt/omv/deploy/monit/services/files/nginx.j2 containing:

Code

{%- if not webadmin_config.forcesslonly | to_bool %}
    if failed host 127.0.0.1 port {{ webadmin_config.port }} protocol http timeout 15 seconds for 2 times within 3 cycles then restart
{% endif -%}
{%- if webadmin_config.enablessl | to_bool %}
    if failed host 127.0.0.1 port {{ webadmin_config.sslport }} type tcpssl protocol http timeout 15 seconds for 2 times within 3 cycles then restart
{% endif -%}

and changed this part: timeout 15 seconds for 2 times within 3 cycles

to this: timeout 25 seconds for 3 times within 4 cycles

There are two occurrences, one for HTTP access to the GUI and one for HTTPS (SSL/TLS). I changed both to remain identical to each other.

(You could choose higher numbers, but it would then take even longer for nginx to be restarted if it does hang. On the other hand, that would actually highlight a problem with nginx itself...)

After that, I ran omv-salt deploy run nginx and let it do its magic. This will permanently update the file /etc/monit/conf.d/openmediavault-nginx.conf to reflect the new monit/nginx configuration.

So far, so good... no nginx alert emails yet. Hope I didn't jinx it just now.

m4tt0 · 11. April 2021

Thanks, cubemin. Just copied your approach. You've got a second tester...

m4tt0 · 17. April 2021

cubemin : ~~Unfortunately this did not work either~~. Ran into an email wall again yesterday. Undoing the changes.

EDIT: Too quick, too early: Trying to "undo" the changes, I saw that they had been overwritten by the original values. I ran updates earlier this week. That's probably why. Changed it again and will continue testing...

Sorry for the noise!

cubemin · 17. April 2021

Thanks for the feedback - hmm, I have to see if my edit got reverted too. So far, though, no more nginx emails for me...

On the plus side, the change is easy enough to make that it can be repeated after updates as needed.

EDIT: My changes have not been reverted. Did you make sure to edit nginx.j2 and run omv salt deploy run nginx?

m4tt0 · 17. April 2021

Most certainly. Did it again today. But I remember that the updates this week contained an update of OMV itself. Maybe that's why. I ran those manually (using apt-get) because of the source-list / teamviewer problems...

cubemin · 17. April 2021

OK, so the jury's still out on whether the nginx fix works for you or not - I'm fairly confident by now that it does for me.

captainkrunch · 19. April 2021

I also did the changes to nginx.j2 some hours ago and until now, no stupid notification mails popped up, so i guess it did the trick. This was really annoying in the last days because i got up to 20 mails per day with these notifications.

Thanks cubemin!

geaves · 19. April 2021

Zitat von cubemin

OK, so the jury's still out on whether the nginx fix works for you or not

It does, but gets over written with an OMV update, I only get them occasionally, first a failed followed immediately by succeeded, but I tried your suggestion and it does work, but if there's an OMV update the change gets overwritten.

gewgaw · 19. April 2021

@cubemin Thank you. I started receiving this over the weekend and have implemented the workaround to see if this helps.

cubemin · 19. April 2021

Zitat von geaves

It does, but gets over written with an OMV update, I only get them occasionally, first a failed followed immediately by succeeded, but I tried your suggestion and it does work, but if there's an OMV update the change gets overwritten.

Gotcha. The change hasn't been overwritten on my system yet, although I could've sworn I've had OMV updates since then.

But I'm glad it works - for me and others - so it will do until there's a permanent solution (I probably should submit this to Github or something)...

geaves · 19. April 2021

Zitat von cubemin

Gotcha. The change hasn't been overwritten on my system yet

Interesting, I actually checked mine after the change and again from your #31 and noticed mine had reverted, but as my notifications are sporadic it's never bothered me.

m4tt0 · 20. April 2021

Zitat von cubemin

...(I probably should submit this to Github or something)...

Yep. Please do. It seems you've collected enough evidence for votdev to consider...

m4tt0 · 9. Juni 2021

OK, I kept receiving those notifications, but I've finally managed to get rid of them.

The short version: votdev was right. It was a configuration problem.

The longer version;

- I've started tracking when the connection failures started to occur and realized they in fact did occur regularly, every second Friday at pretty much the same time in the middle of the night.

- I then started checking the logs and realized that immediately prior to the issue occuring, I had rsync jobs failing.

- Looking into them, I quickly realized what was wrong: I exchanged my server some months ago and demoted the former server to a backup medium. At the same time I changed the old server IP and assigned the old server IP to the new server. I did not change the rsync job though.

- In effect, my new OMV server tried to rsync to itself, with login information for a user that did not exist and while no rsync server was running (that just runs on the old server).

- I've corrected the configuration and restored the standard monit configuration more than three weeks ago.

- No problems since and given the above, I'd be surprised they'd return.

Jetzt mitmachen!