MACVLAN Issues

  • So lately I've been struggling to get my containers to connect to my network over MACVLAN.
    This problem is also beeing discussed in this thread, but I wanted to make a dedicated one to get more awareness on this topic.


    So when setting up a container with MACVLAN it basically seems to work, but it is not reachable from other devices.
    A perfect example would be any GUI that the container might have, this is the case for Pi-hole and IoBroker as discussed in the thread that is linked above.
    A temporary fix is to log in to the container via console and ping the device you try to access the GUI from.
    After this the GUI can be accessed as usual, until after a while it can't be anymore and the process needs to be repeated.


    Another issue that seems to be related is that with IoBroker (used for home automation) I keep losing the connection to my MQTT devices. They work again when I ping from inside the docker, go into the GUI and restart the relevant adapter from there.


    This has not been the case a week ago, but I don't remember what update might have caused this behavior as I was trying to fix some other issue at the time and by now also have reinstalled the whole system.
    So the issue remains also after a reinstall and so far seems to be happening on Raspberry Pi's.


    TLDR:
    The GUI of containers attached to the network with MACVLAN are not reachable anymore.


    So the question is does anybody know what is handeling the MACVLAN driver, is it Linux/Raspbian, OMV or Docker?
    And maybe does somebody even know a solution to this? :)

  • I did some digging on the internet.
    Might be on to something.


    Most likely it is the Linux kernel version which cause the ARP issue with macvlan. I would assume people would have same issue with 4.19.97. So update the kernel version to 5.4.14 would help. But looks like latest raspian buster is still on 4.19.97. Thus we might need to wait till they have a update.
    https://amp.reddit.com/r/archl…rp_and_lts_kernels_41994/


    However, have some theory in mind to explain why. Will update once confirmed.

  • Update:


    This is my discovery( please forgive me if I am using wrong terms but the flow should be correct):


    So the reason is that current kernel version Raspian using has a bug to support macvlan driver, where it seems to break the broadcast function for macvlan.
    Thus, when you ping the container IP from your work station, they have no idea which mac address is associated with that IP.
    But pihole still works as a DNS server due to the fact that its IP is configured on the router side, and router will be the first one to hit when the work station is querying DNS. Router knows the mac address of the pihole and thus able to forward the DNS query to pihole. So pihole still does it's job, just GUI is not reachable from other machines.
    The issue with IoBroker is the same as any GUI, since the MQTT devices doesn't have the ARP entry of your IoBroker.


    Since we ping from the container to our work station, first ping would take a while, it seem to have the work station cache the ARP entry in local. And the connection will keep working because of that.
    Whenever we switch the network or after sometime, the cache will be refreshed and gone, thus we need to repeat the ping.
    One can verify this by doing arp -a command from the work station before and after pinging from the macvlan container.


    Current upgraded workaround ;) would be create a static ARP entry in your work station. (So we don't need to keep pinging the work station again and again.)
    (I only did it on my windows 10 since I uses it mainly, I am sure you can find the same alternatives for your linux or mac, maybe not Andriod nor iOS)


    Here are the STEPs:


    1. Edit and recreate your container with a random static unicast mac address in the network session. (Need to keep the container mac address constant)


    2. Note: Container IP should be static as well.


    3. Once you spin up your container, do the following command as administrator/root in CMD/terminal to create a static arp entry from your work station.


    Windows: (arp -s doesn't work with windows, use the below instead)
    arp -a (Check all your arp entries)
    netsh interface show interface (note down the connected interface name, mine is Wi-Fi for example)
    netsh interface ip add neighbors <your connected interface name> <your.container.ip> <your.container.mac_address>
    arp -a (Now you should find you container IP as a static entry like bel

    Code
    Internet Address Physical Address Type
    <your.container.ip> <your.container.mac_address> static


    Mac:
    arp- a
    sudo arp -s <your.container.ip> <your.container.mac_address>
    arp -a


    4. This will keep the communication from work station to the macvlan container. Even restart of the work station will maintain it.


    But if you have IoT devices which need to talk to the container directly, then you might need to figure out a way to create the static ARP entry on them.
    Or if you know how to use a Dockerfile to build your own image, you can use IoBroker as the base image, and add a 5 mins cronjob to ping your IoT devices (assuming they have all static IPs) (I know this is a stupid idea, but will save you time to repeat the manual ping).


    Mystery solved? :P


    Btw, I submitted an issue to raspberrypi/linux github. :)

  • @limac WOW thanks for putting in all that efford, you are amazing.
    This is exactly why I wanted to pin down the problem, so that we can open an issue at the right place :)


    Oddly iobroker seems to behave differently than pihole (by now I have the pihole container running as well).
    When I ping from inside my iobroker container to any machine it is reachable for every other machine aswell.
    So this was an easy fix as there is a ping adapter for iobroker that is usually used to check if certain devices are online, now I just set it up to ping my PC every minute and so far everything is working for a whole day already without issues :).
    (This also worked despite my PC beeing turned off for the last ~20 hours)


    EDIT:
    Just checked again and the ping adapter seems to be scanning the whole network, so that might be why :D

  • I had the same problem and did an sudo rpi-update to kernel 4.19.102-v7l+.
    Seems like the problem is now gone.

    Cool! That is a clean solution.


    For me, I might still want to stay in the stable kernel version since something might break with the bleeding edge/test version. I'll just play safe and wait till 102 comes out as stable, shouldn't take long I guess.


    @ozboss IoStack sounds fun.

  • Great work, I was about to revert back to OMV4 as my TV Headend container was having similar issues to my PIHole aswell. Any ideas how long it's likely to be until the Stable build picks up the commit?


    Cheers

  • You saved me ! Thank you OMV community :)
    Indeed, upgrading the firmware solved the issue.


    I didn't understand what was wrong, I was making a lot of tests before putting all of this in the RPI, so i was very confused !

  • Just type sudo rpi-update in the terminal. When it's done, sudo reboot to reboot the system. After that, the updated kernel should be loaded !

    hi, I'm on OMV 4 and this command is not found, I don't have any newer updates available ehwnI issue omv-update, I was curious if there is a way to make a custom upgrade of kernel or something like that?

  • OMV ISO image might not base on Raspian OS

    Yes the image is based on Armbian.


    Although I'm not sure anymore that the kernel is really the problem...
    I installed OMV on an Intel x64 pc, as I was having all sorts of issues with my current setup on my Pi.
    So now I'm running OMV5 with kernel 5.4.0 (amd64) and I experience the same behaviour...

  • I think that is related to @limac findings in his post 4, also this post would confirm that there could be an issue in relation to the arp cache.

    Sorry I don't understand what you are trying to tell me.
    Yes the problem seems to be that the arp is not cached properly and so far we thought that the current raspian kernel is the problem.
    But now that I have the same issue on a completely different architecture with different kernel version it seems to indicate that this is not necessarily the case...

  • Sorry I don't understand what you are trying to tell me.

    ?( that @limac seems to have found an issue in relation to the arp cache, also the other link the user had to wait a few hours for this to resolve itself.


    The fact that this can be replicated on amd64 and on the Pi suggests that it may have nothing to do with the kernel, the fact that the new Pi kernel resolves it, is a bonus.


    My test machine is currently shutdown but one option I have thought of testing is to install the proxmox kernel

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!