[Solved] OMV unexpected reboot - softdog unexpected close

    • [Solved] OMV unexpected reboot - softdog unexpected close

      Hi all,


      on a fresh installation of OMV, (sardaukar 0.5.46 on Pendium d with 3GB RAM) using 2x3TB WD in RAID1 and Plex installed using commandline and other official plug-in, I'm experimenting unesxpeted reboot
      What i've seen on console monitor is a message "softdog unexpected close not stopping watchdog" and after few secondo a system reboot. :roll:

      The system log doens't say anything but minor (usb errors).

      I'm not able to say if it appens most when I stream video or not at moment.
      Any idea what's happening and why ? Anyone has experienced the same or similar issues?

      thnks,


      --

      Frecurring
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited

      The post was edited 2 times, last by frecurring ().

    • Re: OMV unexpected reboot - softdog unexpected close

      The watchdog does not get triggered during a given time, thus it forces a reboot. This can occur if your system is under heavy load.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Re: OMV unexpected reboot - softdog unexpected close

      Hi votdev,

      thanks for your reply.

      So every time CPU goes 100% i should expect a reboot ?
      With my old installation using same sowftware but old HD (for test purposes) I've never experienced the problem.

      How may I debug in detail what's causign the softdog not being touched ?

      --

      Frecurring
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited
    • Re: OMV unexpected reboot - softdog unexpected close

      No, the watchdog does not get triggered when CPU goes >= 100%, otherwise it would be a really bad feature or behaviour.
      YOu have to find out what causes the watchdog daemon not to write to the device file which is checked by the kernel every N seconds. If this write does not happen within this time the kernel triggers the reboot. So you have to find out if it is possible that the watchdog daemon was killed unexpected/unregulary..
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Re: OMV unexpected reboot - softdog unexpected close

      Hi votdev,

      i'm trying to understand better the problem.

      I see during the omv startup process

      ...
      starting watchdog keepalive daemon: wd_keepalive
      stopping watchdog keepalive daemom
      stopping watchdog daemon
      ...

      checking ps -ef and wd_keepalive is not running

      I can confirm softdog message and reboot happens whenduring PLEX Media content streaming to my TV.

      does it is normal the wd_keepalive start and stop during start-up process and is not running ?

      thanks
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited
    • Re: OMV unexpected reboot - softdog unexpected close


      I can confirm softdog message and reboot happens whenduring PLEX Media content streaming to my TV.


      Volker said it could be due to high load that the watchdog gets triggered. What has your NAS for specs? How high is the load when streaming content?

      Greetings
      David
      "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"

      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.


      Upload Logfile via WebGUI/CLI
      #openmediavault on freenode IRC | German & English | GMT+1
      Absolutely no Support via PM!

      I host parts of the omv-extras.org Repository, the OpenMediaVault Live Demo and the pre-built PXE Images. If you want you can take part and help covering the costs by having a look at my profile page.
    • Re: OMV unexpected reboot - softdog unexpected close

      Hi,

      sardaukar 0.5.46 on Pendium D 2.8 GHz with 3GB RAM, WD red RAID 1
      CPU go to 100% during streaming if this is what you mean with "how high ..."

      Receiving also some faithful email
      Date: Thu, 08 May 2014 05:37:32 +0100
      Action: alert
      Host: OMVHome
      Description: cpu user usage of 100.0% matches resource limit [cpu user usage>95.0%


      BTW
      Yesterday i've experimented reboot and Plex was non streaming anything :cry:
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited
    • Re: OMV unexpected reboot - softdog unexpected close

      HI all,

      coming back to home I've discovered that the RAID 1 (just setup few days ago with a fresh OMV installation) in resynch state :cry:

      the output of the mdstat if the following

      root@OMVHome:~# cat /proc/mdstat
      Personalities : [raid1]
      md127 : active raid1 sdb[0] sda[1]
      2930265424 blocks super 1.2 [2/2] [UU]
      [>....................] resync = 1.1% (34032128/2930265424) finish=2864.6min speed=16850K/sec

      unused devices: <none>

      and trying to improve speed resynch again

      root@OMVHome:~# mdadm --grow --bitmap=internal /dev/md127
      mdadm: failed to set internal bitmap.

      i'm wondering if there is something wrong with HD or with HW in general

      the resynch state is due to continue reboot in your opinion ?
      any idea / suggestion ?

      thanks
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited
    • Re: OMV unexpected reboot - softdog unexpected close

      HI all,


      cause week-end is coming I'm going to enjoy free-time trying to solve the issues encountered.

      I've found one error in one disk as follows :

      Brainfuck Source Code

      1. ======================================================================
      2. root@OMVHome:~# smartctl --all /dev/sda
      3. smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
      4. Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
      5. === START OF INFORMATION SECTION ===
      6. Device Model: WDC WD30EFRX-68EUZN0
      7. Serial Number: WD-..
      8. Firmware Version: 80.00A80
      9. User Capacity: 3,000,592,982,016 bytes
      10. Device is: Not in smartctl database [for details use: -P showall]
      11. ATA Version is: 9
      12. ATA Standard is: Exact ATA specification draft version not indicated
      13. Local Time is: Fri May 9 17:37:33 2014 BST
      14. SMART support is: Available - device has SMART capability.
      15. SMART support is: Enabled
      16. === START OF READ SMART DATA SECTION ===
      17. SMART overall-health self-assessment test result: PASSED
      18. General SMART Values:
      19. Offline data collection status: (0x80) Offline data collection activity
      20. was never started.
      21. Auto Offline Data Collection: Enabled.
      22. Self-test execution status: ( 0) The previous self-test routine completed
      23. without error or no self-test has ever
      24. been run.
      25. Total time to complete Offline
      26. data collection: (37800) seconds.
      27. Offline data collection
      28. capabilities: (0x7b) SMART execute Offline immediate.
      29. Auto Offline data collection on/off support.
      30. Suspend Offline collection upon new
      31. command.
      32. Offline surface scan supported.
      33. Self-test supported.
      34. Conveyance Self-test supported.
      35. Selective Self-test supported.
      36. SMART capabilities: (0x0003) Saves SMART data before entering
      37. power-saving mode.
      38. Supports SMART auto save timer.
      39. Error logging capability: (0x01) Error logging supported.
      40. General Purpose Logging supported.
      41. Short self-test routine
      42. recommended polling time: ( 2) minutes.
      43. Extended self-test routine
      44. recommended polling time: ( 255) minutes.
      45. Conveyance self-test routine
      46. recommended polling time: ( 5) minutes.
      47. SCT capabilities: (0x703d) SCT Status supported.
      48. SCT Error Recovery Control supported.
      49. SCT Feature Control supported.
      50. SCT Data Table supported.
      51. SMART Attributes Data Structure revision number: 16
      52. Vendor Specific SMART Attributes with Thresholds:
      53. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      54. 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
      55. 3 Spin_Up_Time 0x0027 181 180 021 Pre-fail Always - 5950
      56. 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 65
      57. 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
      58. 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
      59. 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 108
      60. 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
      61. 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
      62. 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19
      63. 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 11
      64. 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 133
      65. 194 Temperature_Celsius 0x0022 119 107 000 Old_age Always - 31
      66. 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
      67. 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
      68. 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
      69. 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
      70. 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
      71. SMART Error Log Version: 1
      72. ATA Error Count: 1
      73. CR = Command Register [HEX]
      74. FR = Features Register [HEX]
      75. SC = Sector Count Register [HEX]
      76. SN = Sector Number Register [HEX]
      77. CL = Cylinder Low Register [HEX]
      78. CH = Cylinder High Register [HEX]
      79. DH = Device/Head Register [HEX]
      80. DC = Device Command Register [HEX]
      81. ER = Error register [HEX]
      82. ST = Status register [HEX]
      83. Powered_Up_Time is measured from power on, and printed as
      84. DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
      85. SS=sec, and sss=millisec. It "wraps" after 49.710 days.
      86. [b][u]
      87. Error 1 occurred at disk power-on lifetime: 1 hours (0 days + 1 hours)
      88. When the command that caused the error occurred, the device was active or idle.[/u][/b]
      89. After command completion occurred, registers were:
      90. ER ST SC SN CL CH DH
      91. -- -- -- -- -- -- --
      92. 04 51 01 00 00 00 00 Error: ABRT
      93. Commands leading to the command that caused the error were:
      94. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
      95. -- -- -- -- -- -- -- -- ---------------- --------------------
      96. b0 d5 01 e1 4f c2 00 08 00:55:35.490 SMART READ LOG
      97. b0 d5 01 e1 4f c2 00 08 00:55:35.490 SMART READ LOG
      98. b0 d6 01 e0 4f c2 00 08 00:55:35.489 SMART WRITE LOG
      99. b0 d6 01 e0 4f c2 00 08 00:55:35.488 SMART WRITE LOG
      100. b0 d5 01 e0 4f c2 00 08 00:55:35.488 SMART READ LOG
      101. SMART Self-test log structure revision number 1
      102. No self-tests have been logged. [To run self-tests, use: smartctl -t]
      103. SMART Selective self-test log data structure revision number 1
      104. SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
      105. 1 0 0 Not_testing
      106. 2 0 0 Not_testing
      107. 3 0 0 Not_testing
      108. 4 0 0 Not_testing
      109. 5 0 0 Not_testing
      110. Selective self-test flags (0x0):
      111. After scanning selected spans, do NOT read-scan remainder of disk.
      112. If Selective self-test is pending on power-up, resume after 0 minute delay.
      113. ======================================================================
      Display All


      My idea is start from scratch and trying to rebuild RAID 1 array again.

      So my question what should be the right procedure/steps to rightly dismantle the RAID 1 array ?
      do I need to erase the superblock information before reuse the two disks on array building ?
      because the HDD is still in the return back time policy do you think I should return the HDD ?


      thanks for your help,
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited
    • [Solved] OMV unexpected reboot - softdog unexpected close

      I finally found that there was a faulty RAM memory banks

      I removed the RAM banks and the problem has disappeared but in the meantime it has corrupted the filesystem

      I re-installed everything again.
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited
    • Re: [Solved] OMV unexpected reboot - softdog unexpected clos

      I recently read an IT blog that the strangest problems can occur with faulty ram banks... Need to keep that in mind.

      Glad you found your problem!
      OMV stoneburner | HP Microserver | 256GB Samsung 830 SSD for system | 4x 2TB in a RAID5
      OMV erasmus| Odroid XU4 | 5TB Data drive | 500GB Backup drive
    • Re: [Solved] OMV unexpected reboot - softdog unexpected clos

      It was really a PAIN and time consuming :o

      the real error I did is introduce 2 different impovements at the same time.

      I introduced the
      1) RAM banks and
      2) reinstall everything using a new Disk

      Fre
      regards
      frecurring

      ---------------------------------
      MOS 7501 & TED 7370
      1,76 MHz with 16kB of RAM
      No HDD but I was excited