Clicking on "SMART > Devices > Information" increases ATA error count!

    • OMV 4.x
    • Clicking on "SMART > Devices > Information" increases ATA error count!

      Hello together,

      I've spent hours to investigate but now I need help.
      I bought 4 new 4TB HDDs (WD Red WD40EFRX) for my server. They seem to be fine but every time I click on "SMART > Devices > Information" the ATA Error count for the selected drive is increased by 1.

      Brainfuck Source Code

      1. smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.17.0-0.bpo.1-amd64] (local build)
      2. Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
      3. === START OF INFORMATION SECTION ===
      4. Model Family: Western Digital Red
      5. Device Model: WDC WD40EFRX-68N32N0
      6. Serial Number: WD-WCC7K5KD7DL8
      7. LU WWN Device Id: 5 0014ee 2ba714616
      8. Firmware Version: 82.00A82
      9. User Capacity: 4,000,787,030,016 bytes [4.00 TB]
      10. Sector Sizes: 512 bytes logical, 4096 bytes physical
      11. Rotation Rate: 5400 rpm
      12. Form Factor: 3.5 inches
      13. Device is: In smartctl database [for details use: -P show]
      14. ATA Version is: ACS-3 T13/2161-D revision 5
      15. SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
      16. Local Time is: Mon Aug 27 03:39:34 2018 CEST
      17. SMART support is: Available - device has SMART capability.
      18. SMART support is: Enabled
      19. AAM feature is: Unavailable
      20. APM feature is: Unavailable
      21. Rd look-ahead is: Enabled
      22. Write cache is: Enabled
      23. ATA Security is: Disabled, frozen [SEC2]
      24. Wt Cache Reorder: Enabled
      25. === START OF READ SMART DATA SECTION ===
      26. SMART overall-health self-assessment test result: PASSED
      27. General SMART Values:
      28. Offline data collection status: (0x80) Offline data collection activity
      29. was never started.
      30. Auto Offline Data Collection: Enabled.
      31. Self-test execution status: ( 0) The previous self-test routine completed
      32. without error or no self-test has ever
      33. been run.
      34. Total time to complete Offline
      35. data collection: (43920) seconds.
      36. Offline data collection
      37. capabilities: (0x7b) SMART execute Offline immediate.
      38. Auto Offline data collection on/off support.
      39. Suspend Offline collection upon new
      40. command.
      41. Offline surface scan supported.
      42. Self-test supported.
      43. Conveyance Self-test supported.
      44. Selective Self-test supported.
      45. SMART capabilities: (0x0003) Saves SMART data before entering
      46. power-saving mode.
      47. Supports SMART auto save timer.
      48. Error logging capability: (0x01) Error logging supported.
      49. General Purpose Logging supported.
      50. Short self-test routine
      51. recommended polling time: ( 2) minutes.
      52. Extended self-test routine
      53. recommended polling time: ( 466) minutes.
      54. Conveyance self-test routine
      55. recommended polling time: ( 5) minutes.
      56. SCT capabilities: (0x303d) SCT Status supported.
      57. SCT Error Recovery Control supported.
      58. SCT Feature Control supported.
      59. SCT Data Table supported.
      60. SMART Attributes Data Structure revision number: 16
      61. Vendor Specific SMART Attributes with Thresholds:
      62. ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
      63. 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
      64. 3 Spin_Up_Time POS--K 100 253 021 - 0
      65. 4 Start_Stop_Count -O--CK 100 100 000 - 3
      66. 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
      67. 7 Seek_Error_Rate -OSR-K 200 200 000 - 0
      68. 9 Power_On_Hours -O--CK 100 100 000 - 21
      69. 10 Spin_Retry_Count -O--CK 100 253 000 - 0
      70. 11 Calibration_Retry_Count -O--CK 100 253 000 - 0
      71. 12 Power_Cycle_Count -O--CK 100 100 000 - 3
      72. 192 Power-Off_Retract_Count -O--CK 200 200 000 - 0
      73. 193 Load_Cycle_Count -O--CK 200 200 000 - 10
      74. 194 Temperature_Celsius -O---K 113 108 000 - 37
      75. 196 Reallocated_Event_Count -O--CK 200 200 000 - 0
      76. 197 Current_Pending_Sector -O--CK 200 200 000 - 0
      77. 198 Offline_Uncorrectable ----CK 100 253 000 - 0
      78. 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
      79. 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
      80. ||||||_ K auto-keep
      81. |||||__ C event count
      82. ||||___ R error rate
      83. |||____ S speed/performance
      84. ||_____ O updated online
      85. |______ P prefailure warning
      86. General Purpose Log Directory Version 1
      87. SMART Log Directory Version 1 [multi-sector log support]
      88. Address Access R/W Size Description
      89. 0x00 GPL,SL R/O 1 Log Directory
      90. 0x01 SL R/O 1 Summary SMART error log
      91. 0x02 SL R/O 5 Comprehensive SMART error log
      92. 0x03 GPL R/O 6 Ext. Comprehensive SMART error log
      93. 0x04 GPL,SL R/O 8 Device Statistics log
      94. 0x06 SL R/O 1 SMART self-test log
      95. 0x07 GPL R/O 1 Extended self-test log
      96. 0x09 SL R/W 1 Selective self-test log
      97. 0x10 GPL R/O 1 SATA NCQ Queued Error log
      98. 0x11 GPL R/O 1 SATA Phy Event Counters log
      99. 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
      100. 0x80-0x9f GPL,SL R/W 16 Host vendor specific log
      101. 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
      102. 0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
      103. 0xb7 GPL,SL VS 56 Device vendor specific log
      104. 0xbd GPL,SL VS 1 Device vendor specific log
      105. 0xc0 GPL,SL VS 1 Device vendor specific log
      106. 0xc1 GPL VS 93 Device vendor specific log
      107. 0xe0 GPL,SL R/W 1 SCT Command/Status
      108. 0xe1 GPL,SL R/W 1 SCT Data Transfer
      109. SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
      110. Device Error Count: 12
      111. CR = Command Register
      112. FEATR = Features Register
      113. COUNT = Count (was: Sector Count) Register
      114. LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
      115. LH = LBA High (was: Cylinder High) Register ] LBA
      116. LM = LBA Mid (was: Cylinder Low) Register ] Register
      117. LL = LBA Low (was: Sector Number) Register ]
      118. DV = Device (was: Device/Head) Register
      119. DC = Device Control Register
      120. ER = Error register
      121. ST = Status register
      122. Powered_Up_Time is measured from power on, and printed as
      123. DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
      124. SS=sec, and sss=millisec. It "wraps" after 49.710 days.
      125. Error 12 [11] occurred at disk power-on lifetime: 21 hours (0 days + 21 hours)
      126. When the command that caused the error occurred, the device was active or idle.
      127. After command completion occurred, registers were:
      128. ER -- ST COUNT LBA_48 LH LM LL DV DC
      129. -- -- -- == -- == == == -- -- -- -- --
      130. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      131. Commands leading to the command that caused the error were:
      132. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      133. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      134. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 21:46:11.746 SMART READ LOG
      135. 2f 00 00 00 01 00 00 00 00 00 04 40 08 21:46:11.697 READ LOG EXT
      136. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 21:46:11.697 SMART READ LOG
      137. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 21:46:11.696 SMART WRITE LOG
      138. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 21:46:11.696 SMART READ LOG
      139. [...]
      Display All
      The important lines are

      Brainfuck Source Code

      1. Error 12 [11] occurred at disk power-on lifetime: 21 hours (0 days + 21 hours)
      2. When the command that caused the error occurred, the device was active or idle.
      3. After command completion occurred, registers were:
      4. ER -- ST COUNT LBA_48 LH LM LL DV DC
      5. -- -- -- == -- == == == -- -- -- -- --
      6. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      7. Commands leading to the command that caused the error were:
      8. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      9. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      10. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 21:46:11.746 SMART READ LOG
      11. 2f 00 00 00 01 00 00 00 00 00 04 40 08 21:46:11.697 READ LOG EXT
      12. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 21:46:11.697 SMART READ LOG
      13. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 21:46:11.696 SMART WRITE LOG
      14. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 21:46:11.696 SMART READ LOG
      Display All
      "ABRT" is an ATA error. And it occurs every time OMV gathers informations by clicking the "Information" button in SMART section. System log is clean when this error occurs but from time to time I get messages like this one:

      Aug 27 03:24:40 server smartd[40602]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], ATA error count increased from 9 to 11

      The thing is: Executing "smartctl -x /dev/sdd" in terminal does NOT increase ATA error count.

      I searched the web and found a lot of other people (here and in other boards) having this exact problem - often with omv and WD HDDs. But they all did not see the connection between this OMV Button and the increasing counts.

      Something seems to be wrong with the implementation of the SMART Information window but I don't know what. (probably an invalid command or an invalid argument) The corresponding lines of code must be somewhere here github.com/openmediavault/open…mediavault/system/storage , but I'm not a programmer and I'm tired. So I am a bit lost here.

      Please tell me what commands (and arguments) are executed when I click on "Information"!

      Thanks for your help!

      The post was edited 10 times, last by Mr Smile ().

    • Here some samples:

      Brainfuck Source Code

      1. Error 9 [8] occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)
      2. When the command that caused the error occurred, the device was active or idle.
      3. After command completion occurred, registers were:
      4. ER -- ST COUNT LBA_48 LH LM LL DV DC
      5. -- -- -- == -- == == == -- -- -- -- --
      6. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      7. Commands leading to the command that caused the error were:
      8. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      9. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      10. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 19:13:34.933 SMART READ LOG
      11. 2f 00 00 00 01 00 00 00 00 01 04 40 08 19:13:34.893 READ LOG EXT
      12. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 19:13:34.879 SMART READ LOG
      13. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 19:13:34.878 SMART WRITE LOG
      14. 2f 00 00 00 01 00 00 00 00 00 04 40 08 19:13:34.833 READ LOG EXT
      15. Error 8 [7] occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)
      16. When the command that caused the error occurred, the device was active or idle.
      17. After command completion occurred, registers were:
      18. ER -- ST COUNT LBA_48 LH LM LL DV DC
      19. -- -- -- == -- == == == -- -- -- -- --
      20. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      21. Commands leading to the command that caused the error were:
      22. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      23. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      24. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 19:04:58.456 SMART READ LOG
      25. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 19:04:58.456 SMART READ LOG
      26. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 19:04:58.456 SMART READ LOG
      27. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 19:04:58.455 SMART WRITE LOG
      28. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 19:04:58.454 SMART WRITE LOG
      29. Error 7 [6] occurred at disk power-on lifetime: 19 hours (0 days + 19 hours)
      30. When the command that caused the error occurred, the device was active or idle.
      31. After command completion occurred, registers were:
      32. ER -- ST COUNT LBA_48 LH LM LL DV DC
      33. -- -- -- == -- == == == -- -- -- -- --
      34. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      35. Commands leading to the command that caused the error were:
      36. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      37. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      38. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 18:58:50.800 SMART READ LOG
      39. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 18:58:50.800 SMART READ LOG
      40. 2f 00 00 00 01 00 00 00 00 01 04 40 08 18:58:50.758 READ LOG EXT
      41. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 18:58:50.745 SMART WRITE LOG
      42. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 18:58:50.744 SMART WRITE LOG
      43. Error 6 [5] occurred at disk power-on lifetime: 13 hours (0 days + 13 hours)
      44. When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
      45. After command completion occurred, registers were:
      46. ER -- ST COUNT LBA_48 LH LM LL DV DC
      47. -- -- -- == -- == == == -- -- -- -- --
      48. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      49. Commands leading to the command that caused the error were:
      50. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      51. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      52. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 13:24:57.652 SMART READ LOG
      53. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 13:24:57.652 SMART READ LOG
      54. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 13:24:57.651 SMART WRITE LOG
      55. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 13:24:57.650 SMART WRITE LOG
      56. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 13:24:57.650 SMART READ LOG
      57. Error 5 [4] occurred at disk power-on lifetime: 11 hours (0 days + 11 hours)
      58. When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
      59. After command completion occurred, registers were:
      60. ER -- ST COUNT LBA_48 LH LM LL DV DC
      61. -- -- -- == -- == == == -- -- -- -- --
      62. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      63. Commands leading to the command that caused the error were:
      64. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      65. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      66. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 11:01:12.375 SMART READ LOG
      67. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 11:01:12.375 SMART WRITE LOG
      68. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 11:01:12.375 SMART READ LOG
      69. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 11:01:12.374 SMART WRITE LOG
      70. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 11:01:12.374 SMART READ LOG
      71. Error 4 [3] occurred at disk power-on lifetime: 11 hours (0 days + 11 hours)
      72. When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
      73. After command completion occurred, registers were:
      74. ER -- ST COUNT LBA_48 LH LM LL DV DC
      75. -- -- -- == -- == == == -- -- -- -- --
      76. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      77. Commands leading to the command that caused the error were:
      78. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      79. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      80. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 10:59:21.773 SMART READ LOG
      81. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 10:59:21.773 SMART READ LOG
      82. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 10:59:21.772 SMART WRITE LOG
      83. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 10:59:21.771 SMART WRITE LOG
      84. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 10:59:21.771 SMART READ LOG
      85. Error 3 [2] occurred at disk power-on lifetime: 11 hours (0 days + 11 hours)
      86. When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
      87. After command completion occurred, registers were:
      88. ER -- ST COUNT LBA_48 LH LM LL DV DC
      89. -- -- -- == -- == == == -- -- -- -- --
      90. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      91. Commands leading to the command that caused the error were:
      92. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      93. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      94. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 10:59:09.515 SMART READ LOG
      95. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 10:59:09.515 SMART WRITE LOG
      96. 2f 00 00 00 01 00 00 00 00 00 00 40 08 10:59:09.515 READ LOG EXT
      97. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 10:59:09.514 SMART WRITE LOG
      98. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 10:59:09.514 SMART READ LOG
      99. Error 2 [1] occurred at disk power-on lifetime: 10 hours (0 days + 10 hours)
      100. When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
      101. After command completion occurred, registers were:
      102. ER -- ST COUNT LBA_48 LH LM LL DV DC
      103. -- -- -- == -- == == == -- -- -- -- --
      104. 04 -- 51 00 0b 00 00 00 00 00 00 00 00 Error: ABRT
      105. Commands leading to the command that caused the error were:
      106. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      107. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      108. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 10:50:43.610 SMART READ LOG
      109. b0 00 d5 00 01 00 00 00 c2 4f e1 00 08 10:50:43.610 SMART READ LOG
      110. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 10:50:43.609 SMART WRITE LOG
      111. b0 00 d6 00 01 00 00 00 c2 4f e0 00 08 10:50:43.608 SMART WRITE LOG
      112. b0 00 d5 00 01 00 00 00 c2 4f e0 00 08 10:50:43.608 SMART READ LOG
      Display All
      edit: I did not notice any "silent" incrementation over time but I can not rule out this. The ATA error count definitely increases when I click on "Information".

      The post was edited 1 time, last by Mr Smile ().

    • The code is here: github.com/openmediavault/open…smartinformation.inc#L149 and nothing else than

      Shell-Script

      1. # smartctl -x [-d N]



      is executed in a SHELL. So there is absolutely no difference to a manual call.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Mr Smile wrote:

      "ABRT" is an ATA error
      Yes. And the firmware of your WD drives for whatever reasons decides to mark SMART queries as ABRT errors when it shouldn't do. Further reading:

    • @ tkaiser: Thanks for those documentation links. 414 sites ... 8| Probably they are helpful some time.

      @ votdev: As I wrote. I double and triple checked that smartctl -x /dev/sdd without -ddoes NOT increase the counter however # smartctl -x [-d N] does.

      Where does OMV get the device type N get from and where can I see the command that is actually executed on my system? There must be something wrong with the type. (or at least WD drives don't like it) And why is -d needed here, when smartctl -x /dev/sdd also works?

      See smartctl manpage: smartmontools.org/browser/trunk/smartmontools/smartctl.8.in

      edit: Before I realized the above I found so many people and long threads all over the internet describing this exact problem but without solutions. Some even swapped their HDDs for new ones, cables and so on because others said something like "increasing SMART counts are a severe problem ... blablabla" And all have absolutely no idea how this is triggered. So I think it would be a good thing to sort this "bug" out now.

      edit2: the oldest threads where from ~2013. So I also think that WD drives have to be handled in an other way (in general) than omv / smartctl does.

      The post was edited 5 times, last by Mr Smile ().

    • Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Mr Smile wrote:

      Thanks. https://www.smartmontools.org/browser/trunk/smarthan yourthan yours.s.tmontools/smartctl.8.in says that the default is "auto" and this seems to work for me. So I ask: Why is -d needed here?

      Thanks for your time!
      'auto' seems to work for YOU, but remember that there is other hardware out there that behaves different. The OMV code tries to guess the optimal setting for the detected device. For SATA devices (this should be the one for your WD device if connected via SATA) the implementation does not return a type except if it is connected via USB.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • But how can I see what command is actually executed?

      smartctl -x /dev/sdd works and doesn't increment the ATA error count.

      smartctl -x -d sat /dev/sdd also works and doesn't increment the ATA error count.

      smartctl -x -d ata /dev/sdd also works and doesn't increment the ATA error count but gives this return at the end of the other Informations

      ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented
      Read SATA Phy Event Counters failed

      This link from your list
      github.com/openmediavault/open…toragedevicecciss.inc#L45
      tells me that probably something like smartctl -x -d cciss ... ??? is executed but i don't have any idea what exactly.

      But as this is a HP Proliant Microserver Gen8 it has indeed a HP's Smart Array controller in it. But it is deactivated in bios. The HDDs run in AHCI mode not as Smart Array! Whats going wrong here?

      The HP Microservers, WD HDDs and OMV are a very popular combination and so I could easily link here dozens of threads where people are facing the exact same behavior and have no idea where the increasing errors come from.

      The post was edited 2 times, last by Mr Smile ().

    • Mr Smile wrote:

      I could easily link here dozens of threads where people are facing the exact same behavior

      Google search for "00 00 00 00 00 00 00 00 Error: ABRT" site:forum.openmediavault.org reveals that this occurs only with WD disks. I doubt it's directly related to a smartctl call, maybe something with stuff that happens directly before.

      I wish execsnoop would be available on Linux (DTrace is sooo convenient to debug stuff on Solaris, FreeBSD and macOS), last time I needed to examine an Ubuntu system's behaviour I followed github.com/iovisor/bcc/blob/master/INSTALL.md#source

      You might try this: bugs.launchpad.net/ubuntu/+sou…/+bug/1470014/comments/10 (checking contents of /dev/disk/by-id/wwn* and then adding appropriate sections to /etc/hdparm.conf. Then a reboot might be necessary.
    • tkaiser wrote:

      Google search for "00 00 00 00 00 00 00 00 Error: ABRT" site:forum.openmediavault.org reveals that this occurs only with WD disks. I doubt it's directly related to a smartctl call, maybe something with stuff that happens directly before.
      Hmm... Maybe. But it would help to know the exact command that is executed by omv so that I could file a bug at smartmontools trac.

      I'm sure that it is smartctl -x -d cciss + some device stuff parameter but as I'm not a programmer I don't understand this code as much as necessary:
      github.com/openmediavault/open…toragedevicecciss.inc#L45

      I think the problem ist somewhere between smartctl and the firmware of WD drives, but I need the exact command, that is executed to confirm or falsify my theory.

      The post was edited 2 times, last by Mr Smile ().

    • I found some interesting log entries in Syslog:

      Source Code

      1. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52, type changed from 'scsi' to 'sat'
      2. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], opened
      3. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], WDC WD40EFRX-68N32N0, S/N:WD-WCC7K4LA7Z52, WWN:5-0014ee-20fd3c0fb, FW:82.00A82, 4.00 TB
      4. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], found in smartd database: Western Digital Red
      5. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], enabled SMART Attribute Autosave.
      6. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], enabled SMART Automatic Offline Testing.
      7. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], is SMART capable. Adding to "monitor" list.
      8. Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], state read from /var/lib/smartmontools/smartd.WDC_WD40EFRX_68N32N0-WD_WCC7K4LA7Z52.ata.state

      Those entries do appear for all drives (also system SSD). ?!?
    • tkaiser wrote:

      Mr Smile wrote:

      But it would help to know the exact command that is executed by omv
      That's why I mentioned execsnoop. OMV relies on some stuff provided by systemd (or maybe udev) which does its own thing.
      I'm sorry but this is too high for me.

      But it would be very nice to have help to understand what this code does and how I can get the infos I need to reconstruct the parameters that are actually inserted behind cciss.

      github.com/openmediavault/open…toragedevicecciss.inc#L45

      edit: I'm sorry but I have to leave now for cinema. Thanks for your help. Will read your answers tomorrow. :)
    • Open the file /etc/default/openmediavault and set the environment variable OMV_DEBUG_PHP="1". After that restart the omv-engined with

      Shell-Script

      1. # monit restart omv-engined



      After that all commands executed by the omv-engined daemon are logged to systog.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Changed the environment variable to "1" but monit restart omv-engined throws an error:
      Cannot create socket to [localhost]:2812 -- Connection refused :S

      I'll do a reboot now.

      ---

      Nope. No messages in "Syslog", when I click on SMART Information. ;(

      But my server is sending me emails every time i do a reboot:

      Source Code

      1. This message was generated by the smartd daemon running on:
      2. host name: server
      3. DNS domain: [Empty]
      4. The following warning/error was logged by the smartd daemon:
      5. Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2UEYHHR [SAT], ATA error count increased from 29 to 30
      6. Device info:
      7. WDC WD40EFRX-68N32N0, S/N:WD-WCC7K2UEYHHR, WWN:5-0014ee-2ba7148a8, FW:82.00A82, 4.00 TB
      8. For details see host's SYSLOG.
      9. You can also use the smartctl utility for further investigation.
      10. The original message about this issue was sent at Sun Aug 26 16:47:11 2018 CEST
      11. Another message will be sent in 24 hours if the problem persists.
      Display All

      The post was edited 2 times, last by Mr Smile ().

    • votdev wrote:

      OMV_DEBUG_PHP="1"
      Shouldn't it read OMV_DEBUG_PHP="YES" now?

      Mr Smile wrote:

      /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2UEYHHR [SAT]
      The '[SAT]' doesn't look right to me since SAT should only occur with USB attached drives. Maybe this is the culprit (and adding the devices with '-d ata -a' to /etc/smartd.conf will give some insights)
    • tkaiser wrote:

      Shouldn't it read OMV_DEBUG_PHP="YES" now?
      No, you can use 1, true, yes or y here.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Mr Smile wrote:

      Nope. No messages in "Syslog", when I click on SMART Information.
      I'm sorry, my mistake. You have to start omv-engined in foreground.

      Shell-Script

      1. # monit stop omv-engined
      2. # omv-engined -d -f
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • votdev wrote:

      You have to start omv-engined in foreground
      I followed this. In my case with a SATA attached Apple branded Hitachi HDD the following commands are executed:

      Source Code

      1. udevadm info --query=property --name='/dev/sda' 2>&1
      2. smartctl -x '/dev/sda' 2>&1
      No icrease of ATA errors of course (I don't use WD disks):

      Source Code

      1. root@espressobin:/home/tk# (udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x /dev/sda 2>&1) | curl -F 'f:1=<-' ix.io
      2. http://ix.io/1lpD