Apply config loop / failure - bonded network interfaces

  • Hi All,


    Hoping someone can spot the problem here, I've run out of ideas!


    Since updating to OMV4, I've had persistent problems with my bonded network interface. I thought I'd fixed them, but noticed again today that I'm stuck in a "apply config" loop. With OMV3, it was working fine for aeons (the OMV3 build was done with the bonded interfaces from the start).


    There are three interfaces on the machine: eth0 is alone on one subnet, eth1 & eth2 are intended to be a 802.3ad bonded pair on another subnet. My switch (Zyxel GS1910) has 802.3ad LACP support enabled for the relevant two ports. I've also tried a fixed bond on the switch with balance-alb on the machine and that doesn't fix the problem, so it's not looking like a LACP issue.


    Every time I tell OMV to apply the config, it fails - see error text below - and the interface is not working: even if ifconfig shows it as UP, it doesn't respond to ping and local services like nut & smbd/nmbd that are bonded to that interface report that it's not accessible. /etc/network/interfaces looks ok to me. If I then reboot the machine, the interface comes up but OMV still shows the config needing to be applied. Enter the "apply config" loop...


    Does anyone have any ideas?


    Screenshots attached show the OMV interface settings.



    Here's the config:


    root@dougal:/etc/network# more /etc/network/interfaces
    # Include additional interface stanzas.
    source-directory interfaces.d


    # The loopback network interface
    auto lo
    iface lo inet loopback


    # eth0 network interface
    auto eth0
    allow-hotplug eth0
    iface eth0 inet static
    address 192.168.8.254
    gateway 192.168.8.1
    netmask 255.255.255.0
    dns-nameservers 192.168.8.1
    dns-search mydomain.net
    iface eth0 inet6 manual
    pre-down ip -6 addr flush dev $IFACE


    # bond0 network interface
    auto bond0
    iface bond0 inet static
    address 192.168.11.22
    gateway 192.168.11.1
    netmask 255.255.254.0
    dns-nameservers 192.168.11.1
    dns-search mydomain.net
    bond-slaves eth1 eth2
    bond-mode 4
    bond-miimon 100
    bond-downdelay 200
    bond-updelay 200
    iface bond0 inet6 manual
    pre-down ip -6 addr flush dev $IFACE


    journalctl -xe shows nothing of interest, just services whingeing that they can't find a network interface.


    root@dougal:/etc/network# systemctl status networking.service
    â— networking.service - Raise network interfaces
    Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
    Active: failed (Result: exit-code) since Sun 2018-06-03 17:40:20 BST; 2min 16s ago
    Docs: man:interfaces(5)
    Process: 2245 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
    Process: 2241 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle (code=exi
    Main PID: 2245 (code=exited, status=1/FAILURE)


    Jun 03 17:40:19 dougal systemd[1]: Starting Raise network interfaces...
    Jun 03 17:40:19 dougal ifup[2245]: sh: echo: I/O error
    Jun 03 17:40:20 dougal ifup[2245]: RTNETLINK answers: File exists
    Jun 03 17:40:20 dougal ifup[2245]: ifup: failed to bring up bond0
    Jun 03 17:40:20 dougal systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
    Jun 03 17:40:20 dougal systemd[1]: Failed to start Raise network interfaces.
    Jun 03 17:40:20 dougal systemd[1]: networking.service: Unit entered failed state.
    Jun 03 17:40:20 dougal systemd[1]: networking.service: Failed with result 'exit-code'.


    IFCONFIG BEFORE REBOOT:
    root@dougal:/etc/network# ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.8.254 netmask 255.255.255.0 broadcast 192.168.8.255
    ether d4:3d:7e:34:a7:16 txqueuelen 1000 (Ethernet)
    RX packets 4617 bytes 1096970 (1.0 MiB)
    RX errors 0 dropped 2 overruns 0 frame 0
    TX packets 6357 bytes 7367515 (7.0 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10<host>
    loop txqueuelen 1000 (Local Loopback)
    RX packets 762 bytes 97399 (95.1 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 762 bytes 97399 (95.1 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


    IFCONFIG AFTER REBOOT:
    bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
    inet 192.168.11.22 netmask 255.255.254.0 broadcast 192.168.11.255
    ether 00:15:17:d3:5f:14 txqueuelen 1000 (Ethernet)
    RX packets 69553 bytes 14195346 (13.5 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 65806 bytes 18205876 (17.3 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 192.168.8.254 netmask 255.255.255.0 broadcast 192.168.8.255
    ether d4:3d:7e:34:a7:16 txqueuelen 1000 (Ethernet)
    RX packets 2250 bytes 174222 (170.1 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 1339 bytes 114101 (111.4 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


    eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
    ether 00:15:17:d3:5f:14 txqueuelen 1000 (Ethernet)
    RX packets 3308 bytes 309103 (301.8 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 61361 bytes 17797720 (16.9 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    device interrupt 16 memory 0xf7ca0000-f7cc0000


    eth2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
    ether 00:15:17:d3:5f:14 txqueuelen 1000 (Ethernet)
    RX packets 66245 bytes 13886243 (13.2 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 4445 bytes 408156 (398.5 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    device interrupt 17 memory 0xf7c40000-f7c60000


    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10<host>
    loop txqueuelen 1000 (Local Loopback)
    RX packets 2890 bytes 306387 (299.2 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 2890 bytes 306387 (299.2 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


    ... interesting to note that eth1 & eth2 have their own entries now, not present before reboot.
    /etc/network/interfaces is unchanged after reboot.


    Any help greatly appreciated!


    Thanks,


    Jeff

  • I've been looking into this some more and it seems to be a mixture of some bad behaviour from the underlying Debian OS and how OMV codes the /etc/network/interfaces file and how the network is restarted when OMV calls systemctl. I've even tried a clean rebuild of OMV4 and the first thing I did was configure the network. Same result (note that the fresh install has changed the interface names):


    Jun 08 18:48:15 dougal systemd-udevd[3180]: Could not generate persistent MAC addr
    ess for bond0: No such file or directory
    Jun 08 18:48:15 dougal kernel: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    Jun 08 18:48:15 dougal kernel: bond0: Enslaving enp1s0f0 as a backup interface with a down
    link
    Jun 08 18:48:16 dougal kernel: bond0: Enslaving enp1s0f1 as a backup interface with a down
    link
    Jun 08 18:48:16 dougal avahi-daemon[436]: Joining mDNS multicast group on interface bond0.
    IPv4 with address 192.168.11.22.
    Jun 08 18:48:16 dougal avahi-daemon[436]: New relevant interface bond0.IPv4 for mDNS.
    Jun 08 18:48:16 dougal avahi-daemon[436]: Registering new address record for 192.168.11.22
    on bond0.IPv4.
    Jun 08 18:48:16 dougal ifup[3028]: RTNETLINK answers: File exists
    Jun 08 18:48:16 dougal ifup[3028]: ifup: failed to bring up bond0
    Jun 08 18:48:16 dougal systemd[1]: networking.service: Main process exited, code=e
    xited, status=1/FAILURE
    Jun 08 18:48:16 dougal systemd[1]: Failed to start Raise network interfaces.
    -- Subject: Unit networking.service has failed


    My switch shows that the two ports from this machine have created a LACP aggregation group, so it all seems to be working at that level.


    What I note, comparing the interface config files here to what I can find about bonding elsewhere:
    (1) OMV doesn't put configuration blocks for the slaves in /etc/network/interfaces. Some references include these:
    auto enp1s0f0
    iface enp1s0f0 inet manual
    bond-master bond0
    and similarly for enp1s0f1


    (2) It seems to be advisable not to define the bond-slaves in the bond0 definition; this way, the bond i/f can come up even if the slaves are slower to come up, thus avoiding an unnecessary error report - i.e., omit the line "bond-slaves ..."


    (3) When OMV tries to restart the networking service, systemctl calls "ifup bond0" and gets the error message "ifup ... RTNETLINK answers: File exists" - this is, I believe, because systemctl failed to take the interface down properly first. That seems to be a Debian issue and I can't find a way to fix it. OMV isn't doing anything wrong here, but will get hit by this problem every time.


    So I'm a little wiser but no closer to a solution. And now I have to restore my config too! I guess I'd best get a beer and settle in for an hour or two...

  • I've been looking into this some more and it seems to be a mixture of some bad behaviour from the underlying Debian OS and how OMV codes the /etc/network/interfaces file and how the network is restarted when OMV calls systemctl. I've even tried a clean rebuild of OMV4 and the first thing I did was configure the network. Same result (note that the fresh install has changed the interface names):


    Jun 08 18:48:15 dougal systemd-udevd[3180]: Could not generate persistent MAC addr
    ess for bond0: No such file or directory
    Jun 08 18:48:15 dougal kernel: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    Jun 08 18:48:15 dougal kernel: bond0: Enslaving enp1s0f0 as a backup interface with a down
    link
    Jun 08 18:48:16 dougal kernel: bond0: Enslaving enp1s0f1 as a backup interface with a down
    link
    Jun 08 18:48:16 dougal avahi-daemon[436]: Joining mDNS multicast group on interface bond0.
    IPv4 with address 192.168.11.22.
    Jun 08 18:48:16 dougal avahi-daemon[436]: New relevant interface bond0.IPv4 for mDNS.
    Jun 08 18:48:16 dougal avahi-daemon[436]: Registering new address record for 192.168.11.22
    on bond0.IPv4.
    Jun 08 18:48:16 dougal ifup[3028]: RTNETLINK answers: File exists
    Jun 08 18:48:16 dougal ifup[3028]: ifup: failed to bring up bond0
    Jun 08 18:48:16 dougal systemd[1]: networking.service: Main process exited, code=e
    xited, status=1/FAILURE
    Jun 08 18:48:16 dougal systemd[1]: Failed to start Raise network interfaces.
    -- Subject: Unit networking.service has failed


    My switch shows that the two ports from this machine have created a LACP aggregation group, so it all seems to be working at that level.


    What I note, comparing the interface config files here to what I can find about bonding elsewhere:
    (1) OMV doesn't put configuration blocks for the slaves in /etc/network/interfaces. Some references include these:
    auto enp1s0f0
    iface enp1s0f0 inet manual
    bond-master bond0
    and similarly for enp1s0f1


    (2) It seems to be advisable not to define the bond-slaves in the bond0 definition; this way, the bond i/f can come up even if the slaves are slower to come up, thus avoiding an unnecessary error report - i.e., omit the line "bond-slaves ..."


    (3) When OMV tries to restart the networking service, systemctl calls "ifup bond0" and gets the error message "ifup ... RTNETLINK answers: File exists" - this is, I believe, because systemctl failed to take the interface down properly first. That seems to be a Debian issue and I can't find a way to fix it. OMV isn't doing anything wrong here, but will get hit by this problem every time.



    It gets worse, though... I removed that bond0 interface through the OMV GUI and applied the config, then created a simple single interface using enp1s0f0. Similar error! Rebooted, it's still giving the error and I'm back in the "need to apply config" loop. Oh, for goodness sake! :cursing:


    I can only assume that either OMV or Debian is doing something here that screws up the definition of the interfaces in some way that applying a change to the config can't overcome. I know there are some other files involved beyond /etc/network/interfaces, so maybe it's time to go searching. This, I really could do without though; my bond was working fine until Deb9/OMV4 came along.

  • This is officially bizarre. I rebuilt again, set up one network interface via omv-firstaid and then a second through the GUI. As soon as I try to apply the config for the second interface from the GUI, I get the errors from the networking service. Nothing more complex than that: add one interface via CLI, second via GUI and voila! the error.


    Seems to make no difference whether the two interfaces have gateways specified or not, whether they're in different subnets or not, anything I've tried just fails the same way.


    Some further testing: I span up a fresh virtual machine with two interfaces, installed OMV from the ISO, logged into the GUI and tried to configure the second interface. Same error! So, it appears that it's currently not possible to set up a OMV4 box with more than one network interface... BIG issue!


    Come on, OMV developers... there have been loads of views of this thread but no comments at all. Does everyone really not have a clue what's going on here?

    Einmal editiert, zuletzt von jefft () aus folgendem Grund: Add further test results.

  • Ok, latest diagnosis, easier now I'm on a throwaway VM rather than my poor server!


    If I set both interfaces to DHCP rather than static addressing, I can get them both to work at the same time. I can even create a bond, as long as it's using DHCP. That's half a solution - at least it gets the system online; it's clearly not satisfactory for a key server to have to use DHCP for its addresses, though.


    What on earth is wrong with Debian's networking, I've no idea though.

  • Ok, latest diagnosis, easier now I'm on a throwaway VM rather than my poor server!


    If I set both interfaces to DHCP rather than static addressing, I can get them both to work at the same time. I can even create a bond, as long as it's using DHCP. That's half a solution - at least it gets the system online; it's clearly not satisfactory for a key server to have to use DHCP for its addresses, though.


    What on earth is wrong with Debian's networking, I've no idea though.

    Im currently embroiled in all of these shenanigans as well. I never had any problems with OMV 3.x, but since moving to 4.x the network interface manipulation for bonding has been a nightmare.

  • I also have this issue!, I'm thinking of moving to OMV 3.x now since I read your reply... I read your post and you clearly know a ton more, if you haven't figured out then I'm almost lost... I didn't want to start a thread in fear of mods saying this issue has been addressed but configuring network is a pain so far with my diy nas... I followed every step and troubleshooted everything, I even started configuring the admin page throught ethernet and yet this NAS distro does not get the simplest config done.


    I attach few screenshots of my issue codes.

  • I don't really recall what exact issue was this but I ended up downgrading a whole 1.x version of OMV and it solved the issue, really was not seeing the light

    =============== Technomancer/Landfill Waste Consultant/EE ==========
    Always a student...
    :thumbup::thumbup:

  • Tkaiser, it for performance.
    All our postproduction network is trunked LACP mode.
    1 Omv nas with 15 1Tb SSD in raid with an intel 4 port gigabit network card
    5 workstations with Dual nic teamed in LACP .


    We didn't past the 10GIGs step because the hardware is not yet obsolete, and the replacement of the switch & network nic is quite a buck.


    Whatsoever, the Bond mode 4 suffer from a bug, and for a nas... that's quite a big deal ;)

  • for performance

    How should a bond help here with just 5 workstations? Have you ever used for example Helios LanTest to check performance against your OMV box with and without bonding?


    We didn't past the 10GIGs step because the hardware is not yet obsolete, and the replacement of the switch & network nic is quite a buck

    Well, maybe we see a price drop with NBase-T soon. Unlike bonding/LACP NBase-T in it's slowest 2.5GbE form really results in 2.5 times faster networking in a NAS environment especially if you invest in a switch with one or two 10GbE ports and upgrade the server (Aquantia or Tehuti 10GbE cards are not that expensive and also Nbase-T capable)

  • Tell me what you need and i will tell you how to get rid of it ;)
    Yes i know there is plenty of faster network solution out here, but our problem here has nothing to do with that :


    How do you create a Static IP LACP bonding in Open media vault without a crash from the network service?

  • How do you create a Static IP LACP bonding in Open media vault without a crash from the network service?

    I do NOT do this since it makes absolutely no sense for my installations. LACP does NOT increase network performance unless you have tens or hundreds of clients since all you get is better distribution over several links (if you chose the algos wisely which is something I rarely see at customers) but the link speed does NOT increase.


    A bond made out of two 1 Gbit/sec NICs is NOT providing 2 Gbit/sec throughput.

  • Again, that not the topic. We only ask for a solution about the static IP . OMV is a NAS, having a bug into this is pretty critical for a NETWORK attached storage.


    Tkaiser i know you think that LACP is crap, but answering like this is like i ask for the best paint formula for a car and you come and told me that this car model is ugly and i better not paint it.

  • Tkaiser i know you think that LACP is crap

    LACP is not crap. It can provide redundancy and in some rare occasions with a lot of clients also increase the overall performance of an utilized server by distributing the load between the available links without increasing the link speed itself. It's not suitable to transform 1-n links into one link of a higher bandwidth even if vast majority of shitty tutorials / blog posts on the Internet and general recommendations claim this.


    So good luck further suffering and fiddling around with something that does not work as expected. BTW: I asked for benchmarks above for a reason since all the time I came across this misunderstanding the administrator in question while wasting a lot of his time to configure stuff that doesn't work as intended never tested whether what he thinks would happen actually happens.

  • The static IP configuration in Debian is IMO completely buggy. I didn't get it working, but the configuration generated by OMV seems to be OK. So i have no idea where the problem comes from.

    Ok Votdev, Will wait for OMV 5 with the DHCP workaround.
    Thanks you for the clear answer


    Best.

  • I recently encountered a similar issue when setting up my OMV on HC2. DHCP was always working fine, but configuring a static IP (through OMV GUI of via /etc/network/interfaces) always resulted in a temporary working solution only. The system would accept the static IP and be reachable. But once the network connection was lost for whatever reason (cable unplugged, switch reset, etc), OMV was unreachable after the network connection came back up. (And I tried different scenario's, installs, etc.) In the end I gave up and set DHCP with a static route on my router.


    Not so long ago I found the solution in a reply from tkaiser somewhere: configure the network via nmtui. I only had to check the "Automatically connect" checkbox for my device (after having configured via OMV) ... working ever since with a fixed IP.


    (My 5 cts)

  • answering like this is like i ask for the best paint formula for a car and you come and told me that this car model is ugly and i better not paint it

    It's the try to prevent you (or more precisely other readers stumbling across this thread) from wasting your and others time dealing with stuff that doesn't work as intended (you can't increase connection throughput with LACP, only between two Linux hosts in a point-to-point topology not suitable for the general NAS use case when choosing balance-rr or balance-alb)

    In the end I gave up and set DHCP with a static route on my router

    Which is the way to go even in small networks. Centralized address management is key to success. But your issue is due to the vast majority of OMV ARM images relying on Network Manager instead of Debian's outdated ifupdown mechanism OMV currently only deals with correctly (not even with all variations like /etc/network/interfaces.d).

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!