ATA Error Count Increasing

    • OMV 2.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • ATA Error Count Increasing

      Hey Guys,

      I've been running OMV for a couple weeks now, a mixture of 4 older 3TB WD Red drives, and 4 new 3TB WD Red drives, using SnapRaid and MergerFS to pool them.

      But I've noticed weekly I get hard drive reports stating the ATA error count has increased,

      Device: WDC_WD30EFRX-68EUZN0_WD-WMC4N1021445 [SAT], ATA error count increased from 5 to 6
      Device: WDC_WD30EFRX-68EUZN0_WD-WCC4N4RR3U8L [SAT], ATA error count increased from 8 to 9
      Device: WDC_WD30EFRX-68EUZN0_WD-WCC4N7TACJ0Z [SAT], ATA error count increased from 5 to 6
      Device: WDC_WD30EFRX-68EUZN0_WD-WCC4N7TACULH [SAT], ATA error count increased from 3 to 4
      Device: WDC_WD30EFRX-68EUZN0_WD-WCC4N0EP7C5R [SAT], ATA error count increased from 5 to 6
      Device: WDC_WD30EFRX-68AX9N0_WD-WCC1T1153741 [SAT], ATA error count increased from 5 to 6
      Device: WDC_WD30EFRX-68EUZN0_WD-WMC4N0942489 [SAT], ATA error count increased from 5 to 6
      Device: WDC_WD30EFRX-68EUZN0_WD-WCC4N76L03FY [SAT], ATA error count increased from 9 to 10

      I used to run a QNAP before OMV with 4 of the drives in Raid5, so these errors are definitely new.


      I do understand a couple of my drives have a high Load_Cycle_Count. 2 out of 4 of my original WD Reds were from the batch that had the head park set to like 7 seconds, so they racked up quite a count.

      As for my system, I'm running:
      Intel Xeon E3-1220V3
      SUPERMICRO MBD-X10SL7 (Flashed to IT mode for running the 8 x 3TB drives)
      16GB Cruial DDR3 SDRAM ECC Unbuffered Ram (2 x 8GB Sticks)

      Can anyone provide some insight or suggestions as to what might be happening, or if this is a concern? I don't see any Offline Uncorrectable errors or bad sectors, so that has to mean a good thing right? All my drives are within warrenty, including my 4 original ones (until the end of this year)

      Any help or input is very much appreciated!

      I'll post one of the smart results in a reply to this, as I'm running into Message Too Long to post.
      Scratch that, my reply don't seem to be showing up. Please find each drive's full Smart Extended Attributes attached here, with each filename being the drives serial number.
      Files
      CPU: Intel Xeon E3-1220 V3
      Mobo: SUPERMICRO X10SL7-F
      RAM: Crucial 16GB (2 x 8GB) DDR3 ECC Unbuffered PC3 12800
      Drives: 8 x 3TB WD Red (Snapraid Pool with 2 Parity), 2 x 120GB Sandisk SSD's (1: OS + Storage, 2: Storage)

      The post was edited 1 time, last by zach86 ().

    • WMC4N1021445:

      Brainfuck Source Code

      1. === START OF READ SMART DATA SECTION ===
      2. SMART overall-health self-assessment test result: PASSED
      3. SMART Attributes Data Structure revision number: 16
      4. Vendor Specific SMART Attributes with Thresholds:
      5. ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
      6. 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
      7. 3 Spin_Up_Time POS--K 174 174 021 - 6266
      8. 4 Start_Stop_Count -O--CK 100 100 000 - 39
      9. 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
      10. 7 Seek_Error_Rate -OSR-K 200 200 000 - 0
      11. 9 Power_On_Hours -O--CK 074 074 000 - 19034
      12. 10 Spin_Retry_Count -O--CK 100 253 000 - 0
      13. 11 Calibration_Retry_Count -O--CK 100 253 000 - 0
      14. 12 Power_Cycle_Count -O--CK 100 100 000 - 38
      15. 192 Power-Off_Retract_Count -O--CK 200 200 000 - 24
      16. 193 Load_Cycle_Count -O--CK 065 065 000 - 405723
      17. 194 Temperature_Celsius -O---K 130 104 000 - 20
      18. 196 Reallocated_Event_Count -O--CK 200 200 000 - 0
      19. 197 Current_Pending_Sector -O--CK 200 200 000 - 0
      20. 198 Offline_Uncorrectable ----CK 200 200 000 - 0
      21. 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
      22. 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
      23. ||||||_ K auto-keep
      24. |||||__ C event count
      25. ||||___ R error rate
      26. |||____ S speed/performance
      27. ||_____ O updated online
      28. |______ P prefailure warning
      29. SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
      30. Device Error Count: 7
      31. Error 7 [6] occurred at disk power-on lifetime: 19034 hours (793 days + 2 hours)
      32. When the command that caused the error occurred, the device was active or idle.
      33. After command completion occurred, registers were:
      34. ER -- ST COUNT LBA_48 LH LM LL DV DC
      35. -- -- -- == -- == == == -- -- -- -- --
      36. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      37. Commands leading to the command that caused the error were:
      38. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      39. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      40. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 19d+06:28:02.285 SMART READ LOG
      41. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 19d+06:28:02.285 SMART READ LOG
      42. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 19d+06:28:02.284 SMART WRITE LOG
      43. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 19d+06:28:02.283 SMART WRITE LOG
      44. b0 00 d5 00 01 00 00 00 c2 4f e0 00 00 19d+06:28:02.283 SMART READ LOG
      45. Error 6 [5] occurred at disk power-on lifetime: 19033 hours (793 days + 1 hours)
      46. When the command that caused the error occurred, the device was active or idle.
      47. After command completion occurred, registers were:
      48. ER -- ST COUNT LBA_48 LH LM LL DV DC
      49. -- -- -- == -- == == == -- -- -- -- --
      50. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      51. Commands leading to the command that caused the error were:
      52. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      53. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      54. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 19d+05:37:28.472 SMART READ LOG
      55. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 19d+05:37:28.472 SMART READ LOG
      56. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 19d+05:37:28.471 SMART WRITE LOG
      57. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 19d+05:37:28.470 SMART WRITE LOG
      58. b0 00 d5 00 01 00 00 00 c2 4f e0 00 00 19d+05:37:28.470 SMART READ LOG
      59. Error 5 [4] occurred at disk power-on lifetime: 18573 hours (773 days + 21 hours)
      60. When the command that caused the error occurred, the device was active or idle.
      61. After command completion occurred, registers were:
      62. ER -- ST COUNT LBA_48 LH LM LL DV DC
      63. -- -- -- == -- == == == -- -- -- -- --
      64. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      65. Commands leading to the command that caused the error were:
      66. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      67. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      68. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:00:30.746 SMART READ LOG
      69. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:00:30.746 SMART READ LOG
      70. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:00:30.745 SMART WRITE LOG
      71. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:00:30.744 SMART WRITE LOG
      72. b0 00 d5 00 01 00 00 00 c2 4f e0 00 00 01:00:30.744 SMART READ LOG
      73. Error 4 [3] occurred at disk power-on lifetime: 18559 hours (773 days + 7 hours)
      74. When the command that caused the error occurred, the device was active or idle.
      75. After command completion occurred, registers were:
      76. ER -- ST COUNT LBA_48 LH LM LL DV DC
      77. -- -- -- == -- == == == -- -- -- -- --
      78. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      79. Commands leading to the command that caused the error were:
      80. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      81. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      82. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:38:43.651 SMART READ LOG
      83. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:38:43.651 SMART READ LOG
      84. e5 00 00 00 00 00 00 00 00 00 00 00 00 01:38:43.651 CHECK POWER MODE
      85. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:38:43.650 SMART WRITE LOG
      86. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:38:43.649 SMART WRITE LOG
      87. Error 3 [2] occurred at disk power-on lifetime: 18559 hours (773 days + 7 hours)
      88. When the command that caused the error occurred, the device was active or idle.
      89. After command completion occurred, registers were:
      90. ER -- ST COUNT LBA_48 LH LM LL DV DC
      91. -- -- -- == -- == == == -- -- -- -- --
      92. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      93. Commands leading to the command that caused the error were:
      94. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      95. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      96. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:35:11.163 SMART READ LOG
      97. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:35:11.163 SMART READ LOG
      98. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:35:11.162 SMART WRITE LOG
      99. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:35:11.161 SMART WRITE LOG
      100. b0 00 d5 00 01 00 00 00 c2 4f e0 00 00 01:35:11.161 SMART READ LOG
      101. Error 2 [1] occurred at disk power-on lifetime: 18559 hours (773 days + 7 hours)
      102. When the command that caused the error occurred, the device was active or idle.
      103. After command completion occurred, registers were:
      104. ER -- ST COUNT LBA_48 LH LM LL DV DC
      105. -- -- -- == -- == == == -- -- -- -- --
      106. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      107. Commands leading to the command that caused the error were:
      108. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      109. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      110. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:29:43.782 SMART READ LOG
      111. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:29:43.782 SMART READ LOG
      112. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:29:43.781 SMART WRITE LOG
      113. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:29:43.780 SMART WRITE LOG
      114. b0 00 d5 00 01 00 00 00 c2 4f e0 00 00 01:29:43.780 SMART READ LOG
      115. Error 1 [0] occurred at disk power-on lifetime: 18559 hours (773 days + 7 hours)
      116. When the command that caused the error occurred, the device was active or idle.
      117. After command completion occurred, registers were:
      118. ER -- ST COUNT LBA_48 LH LM LL DV DC
      119. -- -- -- == -- == == == -- -- -- -- --
      120. 04 -- 51 00 01 00 00 00 00 00 00 00 00 Error: ABRT
      121. Commands leading to the command that caused the error were:
      122. CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
      123. -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
      124. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:28:08.211 SMART READ LOG
      125. b0 00 d5 00 01 00 00 00 c2 4f e1 00 00 01:28:08.211 SMART READ LOG
      126. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:28:08.210 SMART WRITE LOG
      127. b0 00 d6 00 01 00 00 00 c2 4f e0 00 00 01:28:08.209 SMART WRITE LOG
      128. b0 00 d5 00 01 00 00 00 c2 4f e0 00 00 01:28:08.209 SMART READ LOG
      129. SCT Error Recovery Control:
      130. Read: 70 (7.0 seconds)
      131. Write: 70 (7.0 seconds)
      132. SATA Phy Event Counters (GP Log 0x11)
      133. ID Size Value Description
      134. 0x0001 2 0 Command failed due to ICRC error
      135. 0x0002 2 0 R_ERR response for data FIS
      136. 0x0003 2 0 R_ERR response for device-to-host data FIS
      137. 0x0004 2 0 R_ERR response for host-to-device data FIS
      138. 0x0005 2 0 R_ERR response for non-data FIS
      139. 0x0006 2 0 R_ERR response for device-to-host non-data FIS
      140. 0x0007 2 0 R_ERR response for host-to-device non-data FIS
      141. 0x0008 2 0 Device-to-host non-data FIS retries
      142. 0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy
      143. 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
      144. 0x000b 2 0 CRC errors within host-to-device FIS
      145. 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
      146. 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
      147. 0x8000 4 1664883 Vendor specific
      Display All
      CPU: Intel Xeon E3-1220 V3
      Mobo: SUPERMICRO X10SL7-F
      RAM: Crucial 16GB (2 x 8GB) DDR3 ECC Unbuffered PC3 12800
      Drives: 8 x 3TB WD Red (Snapraid Pool with 2 Parity), 2 x 120GB Sandisk SSD's (1: OS + Storage, 2: Storage)
    • I'm using 3TB WD Red also, and they're tend to be having this issue.
      Swap out your SATA cables/port, and see if the error goes away.
      If not, do a RMA while it's still in warranty.
      Avoid buying 3TB WD Red in the future.
      Avoid using spindown the HDDs in the power setting to prolong the HDD life.
      OMV v4.0
      Asus Z97-A/3.1; i3-4370
      32GB RAM Corsair Vengeance Pro
      4x3TB RAID10
    • I can't just RMA all 8 of my drives (4 of them being 2 years old, and 4 of them being under 2 months old) just because of an ATA Error. The other drive statistics themselves are just fine.

      I could see if one or two of them were having some issues, but all 8 of them showing this error, I'm trying to find out a valid cause as to why all 8 of them are doing this with OMV.

      I never spin down my drives, the head parking was part of the hard drive firmware (Similar to the WD Green drives), and came from a bad batch of them. I disabled the head parking altogether with the 4 oldest ones as soon as I actually realized last month. All 4 of the new ones have a time set of 300 seconds before parking the heads which is fine for me.
      CPU: Intel Xeon E3-1220 V3
      Mobo: SUPERMICRO X10SL7-F
      RAM: Crucial 16GB (2 x 8GB) DDR3 ECC Unbuffered PC3 12800
      Drives: 8 x 3TB WD Red (Snapraid Pool with 2 Parity), 2 x 120GB Sandisk SSD's (1: OS + Storage, 2: Storage)
    • Reading that all of your drives go up on the ATA Error count either your Sata Controller wen't sideways or something else changed dramatically. When did those errors start to occur, right from the start after switching from QNAP?

      Greetings
      David
      "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"

      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.


      Upload Logfile via WebGUI/CLI
      #openmediavault on freenode IRC | German & English | GMT+1
      Absolutely no Support via PM!

      I host parts of the omv-extras.org Repository, the OpenMediaVault Live Demo and the pre-built PXE Images. If you want you can take part and help covering the costs by having a look at my profile page.
    • Hey David,

      Yeah these errors started after I built my NAS and switched over from my QNAP. Bought 4 additional drives, played with FreeNAS for awhile, but ended up moving over to OMV completely due to easier interface and usability.

      I built the system brand new a month ago though, and noticed the first errors started around Jan 29th, shortly after I started using OMV.

      I know its unlikely, but could I have 8 faulty Sata cables causing issues? Maybe I need to move the power and sata cables to be further apart from one another, I didn't think that would cause issues.
      It could also be the firmware of the onboard LSI 2308 controller on the Supermicro X10SL7-F motherboard.
      I flashed it to IT mode with the latest version 20 of the firmware. I know a few places suggested version 16, and a few suggested 15, I really wasn't sure which would ultimately be the best. I have versions 15, 16.0.1, 19, and 20 downloaded.

      Attached is a picture with the first 4 drives installed initially after setup, I haven't had a chance to go down and pull the side off to get a picture with all 8, but they are all in a neat line.
      Images
      • 2016-01-19 20.48.57.jpg

        358.54 kB, 898×1,200, viewed 341 times
      CPU: Intel Xeon E3-1220 V3
      Mobo: SUPERMICRO X10SL7-F
      RAM: Crucial 16GB (2 x 8GB) DDR3 ECC Unbuffered PC3 12800
      Drives: 8 x 3TB WD Red (Snapraid Pool with 2 Parity), 2 x 120GB Sandisk SSD's (1: OS + Storage, 2: Storage)
    • I really can't tell but I think it may be worth to try the other fw version if those are the more suggested ones.

      Greetings
      David
      "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"

      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.


      Upload Logfile via WebGUI/CLI
      #openmediavault on freenode IRC | German & English | GMT+1
      Absolutely no Support via PM!

      I host parts of the omv-extras.org Repository, the OpenMediaVault Live Demo and the pre-built PXE Images. If you want you can take part and help covering the costs by having a look at my profile page.