Apply config loop / failure - bonded network interfaces

    • OMV 4.x
    • Apply config loop / failure - bonded network interfaces

      Hi All,

      Hoping someone can spot the problem here, I've run out of ideas!

      Since updating to OMV4, I've had persistent problems with my bonded network interface. I thought I'd fixed them, but noticed again today that I'm stuck in a "apply config" loop. With OMV3, it was working fine for aeons (the OMV3 build was done with the bonded interfaces from the start).

      There are three interfaces on the machine: eth0 is alone on one subnet, eth1 & eth2 are intended to be a 802.3ad bonded pair on another subnet. My switch (Zyxel GS1910) has 802.3ad LACP support enabled for the relevant two ports. I've also tried a fixed bond on the switch with balance-alb on the machine and that doesn't fix the problem, so it's not looking like a LACP issue.

      Every time I tell OMV to apply the config, it fails - see error text below - and the interface is not working: even if ifconfig shows it as UP, it doesn't respond to ping and local services like nut & smbd/nmbd that are bonded to that interface report that it's not accessible. /etc/network/interfaces looks ok to me. If I then reboot the machine, the interface comes up but OMV still shows the config needing to be applied. Enter the "apply config" loop...

      Does anyone have any ideas?

      Screenshots attached show the OMV interface settings.


      Here's the config:

      root@dougal:/etc/network# more /etc/network/interfaces
      # Include additional interface stanzas.
      source-directory interfaces.d

      # The loopback network interface
      auto lo
      iface lo inet loopback

      # eth0 network interface
      auto eth0
      allow-hotplug eth0
      iface eth0 inet static
      address 192.168.8.254
      gateway 192.168.8.1
      netmask 255.255.255.0
      dns-nameservers 192.168.8.1
      dns-search mydomain.net
      iface eth0 inet6 manual
      pre-down ip -6 addr flush dev $IFACE

      # bond0 network interface
      auto bond0
      iface bond0 inet static
      address 192.168.11.22
      gateway 192.168.11.1
      netmask 255.255.254.0
      dns-nameservers 192.168.11.1
      dns-search mydomain.net
      bond-slaves eth1 eth2
      bond-mode 4
      bond-miimon 100
      bond-downdelay 200
      bond-updelay 200
      iface bond0 inet6 manual
      pre-down ip -6 addr flush dev $IFACE


      journalctl -xe shows nothing of interest, just services whingeing that they can't find a network interface.

      root@dougal:/etc/network# systemctl status networking.service
      ● networking.service - Raise network interfaces
      Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
      Active: failed (Result: exit-code) since Sun 2018-06-03 17:40:20 BST; 2min 16s ago
      Docs: man:interfaces(5)
      Process: 2245 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
      Process: 2241 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle (code=exi
      Main PID: 2245 (code=exited, status=1/FAILURE)

      Jun 03 17:40:19 dougal systemd[1]: Starting Raise network interfaces...
      Jun 03 17:40:19 dougal ifup[2245]: sh: echo: I/O error
      Jun 03 17:40:20 dougal ifup[2245]: RTNETLINK answers: File exists
      Jun 03 17:40:20 dougal ifup[2245]: ifup: failed to bring up bond0
      Jun 03 17:40:20 dougal systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
      Jun 03 17:40:20 dougal systemd[1]: Failed to start Raise network interfaces.
      Jun 03 17:40:20 dougal systemd[1]: networking.service: Unit entered failed state.
      Jun 03 17:40:20 dougal systemd[1]: networking.service: Failed with result 'exit-code'.


      IFCONFIG BEFORE REBOOT:
      root@dougal:/etc/network# ifconfig
      eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
      inet 192.168.8.254 netmask 255.255.255.0 broadcast 192.168.8.255
      ether d4:3d:7e:34:a7:16 txqueuelen 1000 (Ethernet)
      RX packets 4617 bytes 1096970 (1.0 MiB)
      RX errors 0 dropped 2 overruns 0 frame 0
      TX packets 6357 bytes 7367515 (7.0 MiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
      inet 127.0.0.1 netmask 255.0.0.0
      inet6 ::1 prefixlen 128 scopeid 0x10<host>
      loop txqueuelen 1000 (Local Loopback)
      RX packets 762 bytes 97399 (95.1 KiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 762 bytes 97399 (95.1 KiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


      IFCONFIG AFTER REBOOT:
      bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
      inet 192.168.11.22 netmask 255.255.254.0 broadcast 192.168.11.255
      ether 00:15:17:d3:5f:14 txqueuelen 1000 (Ethernet)
      RX packets 69553 bytes 14195346 (13.5 MiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 65806 bytes 18205876 (17.3 MiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
      inet 192.168.8.254 netmask 255.255.255.0 broadcast 192.168.8.255
      ether d4:3d:7e:34:a7:16 txqueuelen 1000 (Ethernet)
      RX packets 2250 bytes 174222 (170.1 KiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 1339 bytes 114101 (111.4 KiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
      ether 00:15:17:d3:5f:14 txqueuelen 1000 (Ethernet)
      RX packets 3308 bytes 309103 (301.8 KiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 61361 bytes 17797720 (16.9 MiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
      device interrupt 16 memory 0xf7ca0000-f7cc0000

      eth2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500
      ether 00:15:17:d3:5f:14 txqueuelen 1000 (Ethernet)
      RX packets 66245 bytes 13886243 (13.2 MiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 4445 bytes 408156 (398.5 KiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
      device interrupt 17 memory 0xf7c40000-f7c60000

      lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
      inet 127.0.0.1 netmask 255.0.0.0
      inet6 ::1 prefixlen 128 scopeid 0x10<host>
      loop txqueuelen 1000 (Local Loopback)
      RX packets 2890 bytes 306387 (299.2 KiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 2890 bytes 306387 (299.2 KiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      ... interesting to note that eth1 & eth2 have their own entries now, not present before reboot.
      /etc/network/interfaces is unchanged after reboot.

      Any help greatly appreciated!

      Thanks,

      Jeff
      Images
      • bond0 omv.PNG

        55.51 kB, 692×796, viewed 100 times
    • I've been looking into this some more and it seems to be a mixture of some bad behaviour from the underlying Debian OS and how OMV codes the /etc/network/interfaces file and how the network is restarted when OMV calls systemctl. I've even tried a clean rebuild of OMV4 and the first thing I did was configure the network. Same result (note that the fresh install has changed the interface names):

      Jun 08 18:48:15 dougal systemd-udevd[3180]: Could not generate persistent MAC addr
      ess for bond0: No such file or directory
      Jun 08 18:48:15 dougal kernel: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      Jun 08 18:48:15 dougal kernel: bond0: Enslaving enp1s0f0 as a backup interface with a down
      link
      Jun 08 18:48:16 dougal kernel: bond0: Enslaving enp1s0f1 as a backup interface with a down
      link
      Jun 08 18:48:16 dougal avahi-daemon[436]: Joining mDNS multicast group on interface bond0.
      IPv4 with address 192.168.11.22.
      Jun 08 18:48:16 dougal avahi-daemon[436]: New relevant interface bond0.IPv4 for mDNS.
      Jun 08 18:48:16 dougal avahi-daemon[436]: Registering new address record for 192.168.11.22
      on bond0.IPv4.
      Jun 08 18:48:16 dougal ifup[3028]: RTNETLINK answers: File exists
      Jun 08 18:48:16 dougal ifup[3028]: ifup: failed to bring up bond0
      Jun 08 18:48:16 dougal systemd[1]: networking.service: Main process exited, code=e
      xited, status=1/FAILURE
      Jun 08 18:48:16 dougal systemd[1]: Failed to start Raise network interfaces.
      -- Subject: Unit networking.service has failed

      My switch shows that the two ports from this machine have created a LACP aggregation group, so it all seems to be working at that level.

      What I note, comparing the interface config files here to what I can find about bonding elsewhere:
      (1) OMV doesn't put configuration blocks for the slaves in /etc/network/interfaces. Some references include these:
      auto enp1s0f0
      iface enp1s0f0 inet manual
      bond-master bond0
      and similarly for enp1s0f1

      (2) It seems to be advisable not to define the bond-slaves in the bond0 definition; this way, the bond i/f can come up even if the slaves are slower to come up, thus avoiding an unnecessary error report - i.e., omit the line "bond-slaves ..."

      (3) When OMV tries to restart the networking service, systemctl calls "ifup bond0" and gets the error message "ifup ... RTNETLINK answers: File exists" - this is, I believe, because systemctl failed to take the interface down properly first. That seems to be a Debian issue and I can't find a way to fix it. OMV isn't doing anything wrong here, but will get hit by this problem every time.

      So I'm a little wiser but no closer to a solution. And now I have to restore my config too! I guess I'd best get a beer and settle in for an hour or two...
    • I've been looking into this some more and it seems to be a mixture of some bad behaviour from the underlying Debian OS and how OMV codes the /etc/network/interfaces file and how the network is restarted when OMV calls systemctl. I've even tried a clean rebuild of OMV4 and the first thing I did was configure the network. Same result (note that the fresh install has changed the interface names):

      Jun 08 18:48:15 dougal systemd-udevd[3180]: Could not generate persistent MAC addr
      ess for bond0: No such file or directory
      Jun 08 18:48:15 dougal kernel: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      Jun 08 18:48:15 dougal kernel: bond0: Enslaving enp1s0f0 as a backup interface with a down
      link
      Jun 08 18:48:16 dougal kernel: bond0: Enslaving enp1s0f1 as a backup interface with a down
      link
      Jun 08 18:48:16 dougal avahi-daemon[436]: Joining mDNS multicast group on interface bond0.
      IPv4 with address 192.168.11.22.
      Jun 08 18:48:16 dougal avahi-daemon[436]: New relevant interface bond0.IPv4 for mDNS.
      Jun 08 18:48:16 dougal avahi-daemon[436]: Registering new address record for 192.168.11.22
      on bond0.IPv4.
      Jun 08 18:48:16 dougal ifup[3028]: RTNETLINK answers: File exists
      Jun 08 18:48:16 dougal ifup[3028]: ifup: failed to bring up bond0
      Jun 08 18:48:16 dougal systemd[1]: networking.service: Main process exited, code=e
      xited, status=1/FAILURE
      Jun 08 18:48:16 dougal systemd[1]: Failed to start Raise network interfaces.
      -- Subject: Unit networking.service has failed

      My switch shows that the two ports from this machine have created a LACP aggregation group, so it all seems to be working at that level.

      What I note, comparing the interface config files here to what I can find about bonding elsewhere:
      (1) OMV doesn't put configuration blocks for the slaves in /etc/network/interfaces. Some references include these:
      auto enp1s0f0
      iface enp1s0f0 inet manual
      bond-master bond0
      and similarly for enp1s0f1

      (2) It seems to be advisable not to define the bond-slaves in the bond0 definition; this way, the bond i/f can come up even if the slaves are slower to come up, thus avoiding an unnecessary error report - i.e., omit the line "bond-slaves ..."

      (3) When OMV tries to restart the networking service, systemctl calls "ifup bond0" and gets the error message "ifup ... RTNETLINK answers: File exists" - this is, I believe, because systemctl failed to take the interface down properly first. That seems to be a Debian issue and I can't find a way to fix it. OMV isn't doing anything wrong here, but will get hit by this problem every time.


      It gets worse, though... I removed that bond0 interface through the OMV GUI and applied the config, then created a simple single interface using enp1s0f0. Similar error! Rebooted, it's still giving the error and I'm back in the "need to apply config" loop. Oh, for goodness sake! :cursing:

      I can only assume that either OMV or Debian is doing something here that screws up the definition of the interfaces in some way that applying a change to the config can't overcome. I know there are some other files involved beyond /etc/network/interfaces, so maybe it's time to go searching. This, I really could do without though; my bond was working fine until Deb9/OMV4 came along.

      The post was edited 1 time, last by jefft ().

    • This is officially bizarre. I rebuilt again, set up one network interface via omv-firstaid and then a second through the GUI. As soon as I try to apply the config for the second interface from the GUI, I get the errors from the networking service. Nothing more complex than that: add one interface via CLI, second via GUI and voila! the error.

      Seems to make no difference whether the two interfaces have gateways specified or not, whether they're in different subnets or not, anything I've tried just fails the same way.

      Some further testing: I span up a fresh virtual machine with two interfaces, installed OMV from the ISO, logged into the GUI and tried to configure the second interface. Same error! So, it appears that it's currently not possible to set up a OMV4 box with more than one network interface... BIG issue!

      Come on, OMV developers... there have been loads of views of this thread but no comments at all. Does everyone really not have a clue what's going on here?

      The post was edited 1 time, last by jefft: Add further test results. ().

    • Ok, latest diagnosis, easier now I'm on a throwaway VM rather than my poor server!

      If I set both interfaces to DHCP rather than static addressing, I can get them both to work at the same time. I can even create a bond, as long as it's using DHCP. That's half a solution - at least it gets the system online; it's clearly not satisfactory for a key server to have to use DHCP for its addresses, though.

      What on earth is wrong with Debian's networking, I've no idea though.
    • jefft wrote:

      Ok, latest diagnosis, easier now I'm on a throwaway VM rather than my poor server!

      If I set both interfaces to DHCP rather than static addressing, I can get them both to work at the same time. I can even create a bond, as long as it's using DHCP. That's half a solution - at least it gets the system online; it's clearly not satisfactory for a key server to have to use DHCP for its addresses, though.

      What on earth is wrong with Debian's networking, I've no idea though.
      Im currently embroiled in all of these shenanigans as well. I never had any problems with OMV 3.x, but since moving to 4.x the network interface manipulation for bonding has been a nightmare.
    • I also have this issue!, I'm thinking of moving to OMV 3.x now since I read your reply... I read your post and you clearly know a ton more, if you haven't figured out then I'm almost lost... I didn't want to start a thread in fear of mods saying this issue has been addressed but configuring network is a pain so far with my diy nas... I followed every step and troubleshooted everything, I even started configuring the admin page throught ethernet and yet this NAS distro does not get the simplest config done.

      I attach few screenshots of my issue codes.
      Images
      • Captura de pantalla (319).png

        98.31 kB, 1,340×686, viewed 41 times
      • Captura de pantalla (318).png

        71.78 kB, 949×496, viewed 35 times
      • Captura de pantalla (317).png

        133.88 kB, 1,339×696, viewed 36 times
      =============== Technomancer/Landfill Waste Consultant/EE ==========
      Always a student...
      :thumbup: :thumbup: