No access to web gui of multiple docker containers behind docker socket proxy and traefik v2

  • Dear all

    I am scratching my head not sure what is going on.

    Since about 10 days I cannot access via the web (gui) any of my docker containers that are on the external network of Traefik v2 docker which itself is linked to a Docker Socket Proxy. The lack of access came around the same time I lost the Container Management Gui in the Cockpit provided as part of the OMV Extras Plugin. The containers not accessible are Linuxserver Nextcloud and a Duplicacy Container. I get timeout errors on all web guis of the containers. Command line access is fine.


    Details of my configuration:

    OMV 5 latest version. Docker latest version (resinstalled via OMV Extras tab on OMV gui)

    Running latest kernel

    Docker networks setup with IPV4 but not IPV6 support (has never been an issue)

    Internet is IPV6 enabled

    DNS/certificates is on the docker containers managed via Traefik v2 with Cloudflare DNS


    A couple of things I checked/tried:

    1) Tried to access through multiple different machines on local network including Windows, Android and Linux - no success

    2) checked ping and nslookup and host machine, clients and also in the Nextcloud container itself. All provide responses with both IPV4 and IPV6 addresses reported back.

    3) Checked ALL logs of all containers. No strange errors. All are running, all have the right networks etc (and all containers on traefik which are exposed to the web are on the same network).

    4) Checked ifconfig on host machine and docker0 appears normal.

    5) Adjusted docker-compose file to use API Token instead of API Key for Cloudflare DNS/Traefik setup (as more recently recommended) even though previous configuration works.

    6) Added privileged: true to docker socket proxy container (based on https://github.com/Tecnativa/docker-socket-proxy) just in case.

    7) Tried the fix relating to libseccomp here https://docs.linuxserver.io/fa…ges-based-on-ubuntu-focal

    8) Rebooted and restarted/recreated all containers and made sure latest versions used.

    9) Tried using earlier kernel versions

    10) Installed portainer via the web gui for OMV and everything is shown in green for containers. Can see the containers running, can access command line of containers but cannot access via web.....

    11) Portainer itself, which is not accessed via Traefik is accessible via the web.

    12) On Cloudflare I see no requests reaching my domain, no visits, nothing.


    One thing I notice (although this might be normal) is that if I do "docker ps" no port is shown for docker socket proxy even though the correct port is exposed in the docker compose file. Also, if I check which ports the host is listening on, it has multiple entries for docker-pr with the container ports for traefik listening on the standard ports.


    Ok, now I really have run out of ideas and it is so frustrating. The system was working perfectly for months.


    Any help would be appreciated as I have reached the limits of my troubleshooting skills...


    Thanks

  • To add one comment, when troubleshooting and changing the API KEY to API TOKEN for Cloudflare Environment variables in Traefik Docker file I had made the mistake of leaving my email and using the Zone ID Token for the CF DNS API Token (I had set the token up so long ago for the Cloudflare-DDNS docker image that I had forgotten the token id is shown only during creation). I now have copied the correct token ID to the docker compose file of traefik, deleted the acme.json file and forced a container recreation. Still, as I had done a few restarts in between and the staging server was not being used, I might have run into rate limits with Letsencrypt which might complicate troubleshooting.


    Anyway, just to say that now my Docker compose file for traefik should at least now have the recommend API Token authentication (with two tokens, the DNS change token (ID generated upon original creation) and the Zone ID token (from Cloudflare Overview Tab)).


    The log files for the Traefik container, with this new setup show no errors (as before under the old set up..).

  • Thanks for the response. That is the issue, I cannot access the Traefik dashboard to check anything but I was not using my local ip. I will try and check what you said. I believe now my acme.json file is not working as before as under certificates no entry is there. I need to see if I can dig out a backup of the file. In trying to troubleshoot I could have screwed this up but it might have been screwed before. I noticed I need to enable acme logging in my traeffik config file which I will do now and see what errors it generates. I also noticed I have a watchtower error regarding pulling images (it pulls the image but not the first method it tries).

  • Update: The Traefik dashboard shows no errors. (using IP address for local machine). I have also managed to now get the acme.json file working user the API Token approach rather than API Key. I set logging to debug for Traefik and it checked all domains and subdomains for updates, reported none needed and my acme.json file shows now a valid certificate for my main domain and sub-domains. This all looks normal. Still, even after waiting a while, I cannot access the containers over the internet (from inside my LAN) when I could always do this before.


    The cloudflare API Token is working and the DNS challenge works in Traefik so I cannot think it is a DNS issue on the host machine. I am lost as I said before.

  • - can you access traefik via your public ip-adress, at least there should be a ssl error (if you have a certificate).

    - you can try to activate the traefik acceslog to check if the connection actually reaches traefik.

    - you can try to setup a basic container to use with traefik to rule a problem with your other services.

    - did you update your traefik container? if so, have you checked the traefik v2 minor migration documentation.

    omv 5.5.23 usul | 64 bit | 5.10.0.bpo3 kernel | omvextrasorg 5.5.1

  • On the first point, how could I do this without exposing an insecure dashboard to the internet. Would I need to open for example the standard dashboard port on my router firewall or is there a way I can go over the normal port 80:443 combination in Traefik? Thanks again for all your help.

  • assuming your traefik instance is running with a ssl-certifacte from letsencrypt (for your Cloudflare DNS entry) and your ports 80 and 443 are open just enter your public ip-address in your web browser. You should receive an error message akin to the one in the attachment.

    This would mean your traefik instance is running, receiving the connection and checking your ssl-certificate (which is failing because it is only valid for your dns entry and we tried to direct connect).

  • Ok, so the public IP address approach does not work. Timeout.


    I checked the migration logs. All good. I have not changed my Traefik version as I use 2.2.2-rc4


    I played around with a few things. I changed the DNS in cloudflare to proxy, not just DNS. Then I got 522 errors. So my request goes out to the net, goes ok via cloudflare and then no request is received back from my origin server. I also cannot curl my domain. I get timeouts. All I have done these last weeks is apply the standard OMV updates, nothing more. Has a firewall issue suddenly arisen? I have yet to try the access logs. So lost and so much time wasted....you think things are finally stable and then this....

  • I noticed my public IPv4 is not what my router says is the IP4 address. When I use what the router says it connects with a 404 page not found error. Which at least is a response :-)


    I think my ISP told me this some time ago, I just forgot. However, what I find interesting is that my cloudflare-ddns is also reporting the same public IPv4 address. I am starting to wonder if this is what is causing the problem...I will try and replace the DNS with my real IPv4 address as a test.

  • JACKPOT: oh my god this has caused me a headache. For whatever reason my cloudflare-ddns is not providing to cloudflare the correct IPv4 address. My containers are now back up and running but of course I now need to find out why this is....but god, at least I narrowed it down. Thanks for all your help and suggestions. I will report back once I found the issue with the docker cloudflare-ddns container.

  • Ok, I find I have it working again. I switched to a different image for cloudflare-ddns where you can generate the IPv4 address from a custom command. Then I remembered I have a fritz.box address provided by the router company which always tracks my real IP4 address. Using echo and a rather convoluted linux command I could get echo to output the IP address of the fritz.box address which shows my real IP4 address. Now it seems to work. let us wait until the address changes but I am hopeful for now.


    Will mark as resolved. Thanks

  • newbie7800

    Added the Label resolved
  • In the past yes, but not with this setup. I cannot recall now why I gave that approach up but it did not work as intended. Now, with a modified "ping" command of my fritz box web address I can feed this to the docker cloudflare-ddns compose file and it is now working well. I realised though the command could be done without echo and now it is working after the IP address was changed already once by my ISP. So all stable. Will be looking at moving to IPV6 though...

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!