Latest omv/debian updates break collectd/network?

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Latest omv/debian updates break collectd/network?

      In short, after today's update (it was combined web update of over 20 lib*, new kernel and omv-extras + OMV 3.0.65), collectd keeps hanging/cannot start.

      Also there is severe delay in booting (according to log due to network trouble - failure to get ip form dhcp?? - nothing changed on my dhcp, all other machines wired or wireless work fine on my network).

      I have pretty vanilla OMV with omv-extras/plex/nut/torrent/sensors plugins - nothing fancy was done to the machine otherwise (hardware or software-wise)

      Here's the boot log:

      Source Code

      1. Network hiccup part:
      2. 2017-03-08T11:40:28+0100 nas dhclient[1999]: Listening on LPF/eth2/94:de:80:db:45:9b
      3. 2017-03-08T11:40:28+0100 nas sh[1953]: Listening on LPF/eth2/94:de:80:db:45:9b
      4. 2017-03-08T11:40:28+0100 nas sh[1953]: Sending on LPF/eth2/94:de:80:db:45:9b
      5. 2017-03-08T11:40:28+0100 nas sh[1953]: Sending on Socket/fallback
      6. 2017-03-08T11:40:28+0100 nas sh[1953]: DHCPREQUEST on eth2 to 255.255.255.255 port 67
      7. 2017-03-08T11:40:28+0100 nas dhclient[1999]: Sending on LPF/eth2/94:de:80:db:45:9b
      8. 2017-03-08T11:40:28+0100 nas dhclient[1999]: Sending on Socket/fallback
      9. 2017-03-08T11:40:28+0100 nas dhclient[1999]: DHCPREQUEST on eth2 to 255.255.255.255 port 67
      10. 2017-03-08T11:40:28+0100 nas systemd[1]: Started Network Time Synchronization.
      11. 2017-03-08T11:40:28+0100 nas systemd[1]: Reached target System Time Synchronized.
      12. 2017-03-08T11:40:28+0100 nas systemd[1]: Started folder2ram systemd service.
      13. 2017-03-08T11:40:30+0100 nas kernel: r8169 0000:02:00.0 eth2: link up
      14. 2017-03-08T11:40:33+0100 nas dhclient[1999]: DHCPREQUEST on eth2 to 255.255.255.255 port 67
      15. 2017-03-08T11:40:33+0100 nas sh[1953]: DHCPREQUEST on eth2 to 255.255.255.255 port 67
      16. 2017-03-08T11:40:33+0100 nas dhclient[1999]: DHCPACK from 10.1.1.1
      17. 2017-03-08T11:40:33+0100 nas sh[1953]: DHCPACK from 10.1.1.1
      18. 2017-03-08T11:45:28+0100 nas systemd[1]: networking.service: Start operation timed out. Terminating.
      19. 2017-03-08T11:45:28+0100 nas systemd[1]: Failed to start Raise network interfaces.
      20. 2017-03-08T11:45:28+0100 nas systemd[1]: networking.service: Unit entered failed state.
      21. 2017-03-08T11:45:28+0100 nas systemd[1]: networking.service: Failed with result 'timeout'.
      22. 2017-03-08T11:45:28+0100 nas systemd[1]: Reached target Network.
      23. 2017-03-08T11:45:28+0100 nas systemd[1]: Reached target Network is Online.
      24. 2017-03-08T11:45:28+0100 nas systemd[1]: Starting LSB: RPC portmapper replacement...
      25. 2017-03-08T11:45:28+0100 nas rpcbind[2302]: Starting rpcbind daemon....
      26. 2017-03-08T11:45:28+0100 nas systemd[1]: Started LSB: RPC portmapper replacement.
      27. 2017-03-08T11:45:28+0100 nas systemd[1]: Reached target RPC Port Mapper.
      28. 2017-03-08T11:45:28+0100 nas systemd[1]: Starting LSB: NFS support files common to client and server...
      29. 2017-03-08T11:45:28+0100 nas rpc.statd[2322]: Version 1.2.8 starting
      30. 2017-03-08T11:45:28+0100 nas sm-notify[2323]: Version 1.2.8 starting
      31. Monit/collectd failure:
      32. 2017-03-08T11:45:48+0100 nas collectd[2570]: Init SSL without certificate database
      33. 2017-03-08T11:45:48+0100 nas collectd[2570]: nut plugin: Connection to (localhost, 3493) established.
      34. 2017-03-08T11:45:58+0100 nas monit[2521]: 'nas' Monit 5.20.0 started
      35. 2017-03-08T11:45:58+0100 nas monit[2521]: 'collectd' process is not running
      36. 2017-03-08T11:45:58+0100 nas monit[2521]: 'collectd' trying to restart
      37. 2017-03-08T11:45:58+0100 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      38. 2017-03-08T11:46:29+0100 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      39. 2017-03-08T11:46:29+0100 nas smbd[2701]: [2017/03/08 11:46:29.328812, 2] ../source3/smbd/server.c:443(remove_child_pid)
      40. 2017-03-08T11:46:29+0100 nas smbd[2701]: Could not find child 3023 -- ignoring
      41. 2017-03-08T11:46:59+0100 nas monit[2521]: 'collectd' process is not running
      42. 2017-03-08T11:46:59+0100 nas monit[2521]: 'collectd' trying to restart
      43. 2017-03-08T11:46:59+0100 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      44. 2017-03-08T11:47:29+0100 nas smbd[2701]: [2017/03/08 11:47:29.382413, 2] ../source3/smbd/server.c:443(remove_child_pid)
      45. 2017-03-08T11:47:29+0100 nas smbd[2701]: Could not find child 3041 -- ignoring
      46. 2017-03-08T11:47:29+0100 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      47. 2017-03-08T11:47:59+0100 nas monit[2521]: 'collectd' process is not running
      48. 2017-03-08T11:47:59+0100 nas monit[2521]: 'collectd' trying to restart
      49. 2017-03-08T11:47:59+0100 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      50. 2017-03-08T11:48:29+0100 nas smbd[2701]: [2017/03/08 11:48:29.393184, 2] ../source3/smbd/server.c:443(remove_child_pid)
      51. 2017-03-08T11:48:29+0100 nas smbd[2701]: Could not find child 3057 -- ignoring
      52. 2017-03-08T11:48:29+0100 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      53. 2017-03-08T11:49:00+0100 nas monit[2521]: 'collectd' process is not running
      54. 2017-03-08T11:49:00+0100 nas monit[2521]: 'collectd' trying to restart
      55. 2017-03-08T11:49:00+0100 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      56. 2017-03-08T11:49:29+0100 nas smbd[2701]: [2017/03/08 11:49:29.424774, 2] ../source3/smbd/server.c:443(remove_child_pid)
      57. 2017-03-08T11:49:29+0100 nas smbd[2701]: Could not find child 3068 -- ignoring
      58. 2017-03-08T11:49:30+0100 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      59. 2017-03-08T11:50:00+0100 nas monit[2521]: 'collectd' process is not running
      60. 2017-03-08T11:50:00+0100 nas monit[2521]: 'collectd' trying to restart
      61. 2017-03-08T11:50:00+0100 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      62. 2017-03-08T11:50:16+0100 nas openmediavault-webgui[3077]: Authorized login from 10.1.1.70 [username=admin, user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36]
      63. 2017-03-08T11:50:28+0100 nas smbd[3120]: [2017/03/08 11:50:28.047805, 1] ../lib/param/loadparm.c:1638(lpcfg_do_global_parameter)
      64. 2017-03-08T11:50:28+0100 nas smbd[3120]: WARNING: The "null passwords" option is deprecated
      65. 2017-03-08T11:50:29+0100 nas smbd[2701]: [2017/03/08 11:50:29.427590, 2] ../source3/smbd/server.c:443(remove_child_pid)
      66. 2017-03-08T11:50:29+0100 nas smbd[2701]: Could not find child 3122 -- ignoring
      67. 2017-03-08T11:50:30+0100 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      Display All
      And the syslog (part relevant to collectd):

      Source Code

      1. Mar 8 11:45:28 nas collectd[2371]: plugin_load: plugin "nut" successfully loaded.
      2. Mar 8 11:45:28 nas collectd[2371]: plugin_load: plugin "sensors" successfully loaded.
      3. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "syslog" successfully loaded.
      4. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "rrdcached" successfully loaded.
      5. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "unixsock" successfully loaded.
      6. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "cpu" successfully loaded.
      7. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "df" successfully loaded.
      8. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "interface" successfully loaded.
      9. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "load" successfully loaded.
      10. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "memory" successfully loaded.
      11. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "nut" successfully loaded.
      12. Mar 8 11:45:28 nas collectd[2570]: plugin_load: plugin "sensors" successfully loaded.
      13. Mar 8 11:45:28 nas collectd[2570]: Systemd detected, trying to signal readyness.
      14. Mar 8 11:45:28 nas collectd[2570]: Initialization complete, entering read-loop.
      15. Mar 8 11:45:28 nas collectd[2570]: nut plugin: nut_read_one: upscli_connect (localhost, 3493) failed: Connection failure: Cannot assign requested address
      16. Mar 8 11:45:28 nas collectd[2570]: read-function of plugin `nut' failed. Will suspend it for 20.000 seconds.
      17. Mar 8 11:45:48 nas collectd[2570]: Init SSL without certificate database
      18. Mar 8 11:45:48 nas collectd[2570]: nut plugin: Connection to (localhost, 3493) established.
      19. Mar 8 11:45:58 nas monit[2521]: 'collectd' process is not running
      20. Mar 8 11:45:58 nas monit[2521]: 'collectd' trying to restart
      21. Mar 8 11:45:58 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      22. Mar 8 11:45:59 nas postfix/smtp[2995]: DFE946455: replace: header Subject: monit alert -- Does not exist collectd: Subject: [nas.necto.loc] monit alert -- Does not exist collectd
      23. Mar 8 11:46:29 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      24. Mar 8 11:46:29 nas postfix/smtp[2995]: 28D3B647A: replace: header Subject: monit alert -- Execution failed collectd: Subject: [nas.necto.loc] monit alert -- Execution failed collectd
      25. Mar 8 11:46:59 nas monit[2521]: 'collectd' process is not running
      26. Mar 8 11:46:59 nas monit[2521]: 'collectd' trying to restart
      27. Mar 8 11:46:59 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      28. Mar 8 11:47:29 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      29. Mar 8 11:47:59 nas monit[2521]: 'collectd' process is not running
      30. Mar 8 11:47:59 nas monit[2521]: 'collectd' trying to restart
      31. Mar 8 11:47:59 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      32. Mar 8 11:48:29 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      33. Mar 8 11:49:00 nas monit[2521]: 'collectd' process is not running
      34. Mar 8 11:49:00 nas monit[2521]: 'collectd' trying to restart
      35. Mar 8 11:49:00 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      36. Mar 8 11:49:30 nas monit[2521]: 'collectd' failed to start (exit status 0) -- no output
      37. Mar 8 11:50:00 nas monit[2521]: 'collectd' process is not running
      38. Mar 8 11:50:00 nas monit[2521]: 'collectd' trying to restart
      39. Mar 8 11:50:00 nas monit[2521]: 'collectd' start: '/bin/systemctl start collectd'
      Display All

      Could it be that nut somehow kills collectd?

      I will try to set IP manually & disable nut to see if anything changes
      OMV erasmus 3.0.90 | omv-extras 3.4.29 | Xeon E3 1225 | 8GB ECC RAM | 2x3 TB WD Red | 1.5 TB WD Green | 1TB HGST
    • (Doubleposting because of 10K character limit)

      I managed to disable UPS/nut - without webui complaining (collectd still fails to start but machine boots normally now)

      Any attempted change of System/Network fails with this error:

      Source Code

      1. Error #0:
      2. exception 'OMV\ExecException' with message 'Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; systemctl start networking 2>&1' with exit code '1': Job for networking.service failed because the control process exited with error code.
      3. See "systemctl status networking.service" and "journalctl -xe" for details.' in /usr/share/php/openmediavault/system/process.inc:175
      4. Stack trace:
      5. #0 /usr/share/php/openmediavault/system/systemctl.inc(83): OMV\System\Process->execute(Array, 1)
      6. #1 /usr/share/php/openmediavault/system/systemctl.inc(140): OMV\System\SystemCtl->exec('start', NULL, false)
      7. #2 /usr/share/openmediavault/engined/module/networking.inc(44): OMV\System\SystemCtl->start()
      8. #3 /usr/share/openmediavault/engined/rpc/config.inc(189): OMVModuleNetworking->startService()
      9. #4 [internal function]: OMVRpcServiceConfig->applyChanges(Array, Array)
      10. #5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(124): call_user_func_array(Array, Array)
      11. #6 /usr/share/php/openmediavault/rpc/serviceabstract.inc(150): OMV\Rpc\ServiceAbstract->callMethod('applyChanges', Array, Array)
      12. #7 /usr/share/php/openmediavault/rpc/serviceabstract.inc(528): OMV\Rpc\ServiceAbstract->OMV\Rpc\{closure}('/tmp/bgstatusds...', '/tmp/bgoutputQP...')
      13. #8 /usr/share/php/openmediavault/rpc/serviceabstract.inc(151): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure))
      14. #9 /usr/share/openmediavault/engined/rpc/config.inc(208): OMV\Rpc\ServiceAbstract->callMethodBg('applyChanges', Array, Array)
      15. #10 [internal function]: OMVRpcServiceConfig->applyChangesBg(Array, Array)
      16. #11 /usr/share/php/openmediavault/rpc/serviceabstract.inc(124): call_user_func_array(Array, Array)
      17. #12 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('applyChangesBg', Array, Array)
      18. #13 /usr/sbin/omv-engined(536): OMV\Rpc\Rpc::call('Config', 'applyChangesBg', Array, Array, 1)
      19. #14 {main}
      Display All

      UPDATE:

      I enabled nut again / network remains stuck at DHCP (still can't change it due to aforementioned error) - I am greeted by similar (monit) error this time:


      Source Code

      1. Error #0:
      2. exception 'OMV\ExecException' with message 'Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; monit summary 2>&1' with exit code '1': Cannot create socket to [localhost]:2812 -- Connection refused' in /usr/share/php/openmediavault/system/process.inc:175
      3. Stack trace:
      4. #0 /usr/share/php/openmediavault/system/monit.inc(142): OMV\System\Process->execute(Array)
      5. #1 /usr/share/php/openmediavault/system/monit.inc(171): OMV\System\Monit->status()
      6. #2 /usr/share/openmediavault/engined/module/nut.inc(105): OMV\System\Monit->isRunning()
      7. #3 /usr/share/openmediavault/engined/rpc/services.inc(56): OMVModuleNetworkUPSTools->getStatus()
      8. #4 [internal function]: OMVRpcServiceServices->getStatus(Array, Array)
      9. #5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(124): call_user_func_array(Array, Array)
      10. #6 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('getStatus', Array, Array)
      11. #7 /usr/sbin/omv-engined(536): OMV\Rpc\Rpc::call('Services', 'getStatus', Array, Array, 1)
      12. #8 {main}
      Display All
      OMV erasmus 3.0.90 | omv-extras 3.4.29 | Xeon E3 1225 | 8GB ECC RAM | 2x3 TB WD Red | 1.5 TB WD Green | 1TB HGST

      The post was edited 1 time, last by QSx ().

    • Well, still got trouble here - I think the latest kernel is the culprit or something that has to do with my NIC drivers

      Since eth2 error keep appearing in log i tried omv-firstaid and then manually setting the network

      Again it failed - i got this error (if it helps)

      Source Code

      1. root@nas:~# systemctl status networking.service
      2. ● networking.service - Raise network interfaces
      3. Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
      4. Drop-In: /run/systemd/generator/networking.service.d
      5. └─50-insserv.conf-$network.conf
      6. Active: failed (Result: exit-code) since Wed 2017-03-08 21:19:49 CET; 20s ago
      7. Docs: man:interfaces(5)
      8. Main PID: 4276 (code=exited, status=1/FAILURE)
      9. Mar 08 21:19:49 nas systemd[1]: Starting Raise network interfaces...
      10. Mar 08 21:19:49 nas ifup[4276]: RTNETLINK answers: File exists
      11. Mar 08 21:19:49 nas ifup[4276]: Failed to bring up eth2.
      12. Mar 08 21:19:49 nas systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
      13. Mar 08 21:19:49 nas systemd[1]: Failed to start Raise network interfaces.
      14. Mar 08 21:19:49 nas systemd[1]: networking.service: Unit entered failed state.
      15. Mar 08 21:19:49 nas systemd[1]: networking.service: Failed with result 'exit-code'.
      Display All

      At this point, I'm considering rolling back on previous kernel (is it possible) or slapping another NIC in that machine until whatever got borked gets fixed...

      For starters, how would I roll back on today update? Booting old kernel from bootmenu only hangs at "Loading folder2ram systemd service..."

      UPDATE: Okay, I disabled onboard chip and slapped on PCI NIC - same crap... any ideas boys?
      OMV erasmus 3.0.90 | omv-extras 3.4.29 | Xeon E3 1225 | 8GB ECC RAM | 2x3 TB WD Red | 1.5 TB WD Green | 1TB HGST

      The post was edited 2 times, last by QSx ().

    • Negative,

      I went with completely fresh install - it's been working fine since (same hardware, same install, same plugins) - no idea what went wrong :/
      OMV erasmus 3.0.90 | omv-extras 3.4.29 | Xeon E3 1225 | 8GB ECC RAM | 2x3 TB WD Red | 1.5 TB WD Green | 1TB HGST