Clicking on "SMART > Devices > Information" increases ATA error count!

    • OMV 4.x
    • Resolved
    • This console output is produced while clicking the "Information" button of /dev/sda (Error count increases from 17 to 18):

      Source Code

      1. root@server:~# monit stop omv-engined
      2. root@server:~# smartctl -x /dev/sda | grep "Error "
      3. Error logging capability: (0x01) Error logging supported.
      4. SCT Error Recovery Control supported.
      5. 0x10 GPL R/O< 1 SATA NCQ Queued Error log
      6. SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
      7. Device Error Count: 17
      8. ER = Error register
      9. Error 17 [16] occurred at disk power-on lifetime: 59 hours (2 days + 11 hours)
      10. [...]
      11. root@server:~# omv-engined -d -f
      12. DEBUG [Tue, 28 Aug 18 17:42:42 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; findmnt -f -n -o SOURCE / 2>&1'
      13. Registered data models:
      14. [...]
      15. Registered filesystem backends:
      16. [...]
      17. Registered RPC services:
      18. [...]
      19. Registered modules:
      20. [...]
      21. SIGCHLD received ...
      22. Child process forked (pid=3857)
      23. Executing RPC (service=Config, method=isDirty, params=null, context={"username":"admin","role":1}) ...
      24. RPC response (service=Config, method=isDirty): {"response":false,"error":null}
      25. SIGCHLD received ...
      26. Child (pid=3857) terminated with exit code 0
      27. Child process forked (pid=3858)
      28. Executing RPC (service=Smart, method=getInformation, params={"devicefile":"\/dev\/sda"}, context={"username":"admin","role":1}) ...
      29. DEBUG [Tue, 28 Aug 18 17:42:46 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1'
      30. DEBUG [Tue, 28 Aug 18 17:42:46 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; smartctl -x '/dev/sda' 2>&1'
      31. Child process forked (pid=3865)
      32. Child process forked (pid=3866)
      33. Executing RPC (service=Smart, method=getAttributes, params={"devicefile":"\/dev\/sda"}, context={"username":"admin","role":1}) ...
      34. Executing RPC (service=Smart, method=getSelfTestLogs, params={"devicefile":"\/dev\/sda"}, context={"username":"admin","role":1}) ...
      35. DEBUG [Tue, 28 Aug 18 17:42:46 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1'
      36. DEBUG [Tue, 28 Aug 18 17:42:46 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1'
      37. DEBUG [Tue, 28 Aug 18 17:42:46 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; smartctl -x '/dev/sda' 2>&1'
      38. DEBUG [Tue, 28 Aug 18 17:42:46 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; smartctl -x '/dev/sda' 2>&1'
      39. RPC response (service=Smart, method=getInformation): {"response":{"devicemodel":"WDC WD40EFRX-68N32N0","serialnumber":"WD-WCC7K1EKCYT3","firmwareversion":"82.00A82","usercapacity":"4,000,787,030,016 bytes [4.00 TB]","modelfamily":"Western Digital Red","luwwndeviceid":"5 0014ee 20fda8dd0","sectorsizes":"512 bytes logical, 4096 bytes physical","rotationrate":"5400 rpm","formfactor":"3.5 inches","deviceis":"In smartctl database [for details use: -P show]","ataversionis":"ACS-3 T13\/2161-D revision 5","sataversionis":"SATA 3.1, 6.0 Gb\/s (current: 6.0 Gb\/s)","localtimeis":"Tue Aug 28 17:42:46 2018 CEST","smartsupportis":"Enabled","aamfeatureis":"Unavailable","apmfeatureis":"Unavailable","rdlook-aheadis":"Enabled","writecacheis":"Enabled","atasecurityis":"Disabled, frozen [SEC2]","wtcachereorder":"Enabled"},"error":null}
      40. SIGCHLD received ...
      41. Child (pid=3858) terminated with exit code 0
      42. RPC response (service=Smart, method=getSelfTestLogs): {"response":[{"num":"1","description":"Extended offline","status":"Completed without error ","remaining":"00","lifetime":"27","lbaoffirsterror":"-"}],"error":null}
      43. SIGCHLD received ...
      44. Child (pid=3866) terminated with exit code 0
      45. RPC response (service=Smart, method=getAttributes): {"response":[{"id":1,"attrname":"Raw_Read_Error_Rate","flags":"POSR-K","value":200,"worst":200,"treshold":51,"whenfailed":"-","rawvalue":"0","description":"Frequency of errors while reading raw data from the disk","prefailure":true,"assessment":"GOOD"},{"id":3,"attrname":"Spin_Up_Time","flags":"POS--K","value":100,"worst":253,"treshold":21,"whenfailed":"-","rawvalue":"0","description":"Time needed to spin up the disk","prefailure":true,"assessment":"GOOD"},{"id":4,"attrname":"Start_Stop_Count","flags":"-O--CK","value":100,"worst":100,"treshold":0,"whenfailed":"-","rawvalue":"3","description":"Number of spindle start\/stop cycles","prefailure":false,"assessment":"BAD_STATUS"},{"id":5,"attrname":"Reallocated_Sector_Ct","flags":"PO--CK","value":200,"worst":200,"treshold":140,"whenfailed":"-","rawvalue":"0","description":"Number of remapped sectors","prefailure":true,"assessment":"GOOD"},{"id":7,"attrname":"Seek_Error_Rate","flags":"-OSR-K","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Frequency of errors while positioning","prefailure":false,"assessment":"BAD_STATUS"},{"id":9,"attrname":"Power_On_Hours","flags":"-O--CK","value":100,"worst":100,"treshold":0,"whenfailed":"-","rawvalue":"59","description":"Number of hours elapsed in the power-on state","prefailure":false,"assessment":"BAD_STATUS"},{"id":10,"attrname":"Spin_Retry_Count","flags":"-O--CK","value":100,"worst":253,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of retry attempts to spin up","prefailure":false,"assessment":"BAD_STATUS"},{"id":11,"attrname":"Calibration_Retry_Count","flags":"-O--CK","value":100,"worst":253,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of attempts to calibrate the device","prefailure":false,"assessment":"BAD_STATUS"},{"id":12,"attrname":"Power_Cycle_Count","flags":"-O--CK","value":100,"worst":100,"treshold":0,"whenfailed":"-","rawvalue":"3","description":"Number of power-on events","prefailure":false,"assessment":"BAD_STATUS"},{"id":192,"attrname":"Power-Off_Retract_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of power-off or emergency retract cycles","prefailure":false,"assessment":"BAD_STATUS"},{"id":193,"attrname":"Load_Cycle_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"16","description":"Number of cycles into landing zone position","prefailure":false,"assessment":"BAD_STATUS"},{"id":194,"attrname":"Temperature_Celsius","flags":"-O---K","value":113,"worst":109,"treshold":0,"whenfailed":"-","rawvalue":"37","description":"Current internal temperature of the drive","prefailure":false,"assessment":"BAD_STATUS"},{"id":196,"attrname":"Reallocated_Event_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of remapping operations","prefailure":false,"assessment":"BAD_STATUS"},{"id":197,"attrname":"Current_Pending_Sector","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of sectors waiting to be remapped","prefailure":false,"assessment":"BAD_STATUS"},{"id":198,"attrname":"Offline_Uncorrectable","flags":"----CK","value":100,"worst":253,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"The total number of uncorrectable errors when reading\/writing a sector","prefailure":false,"assessment":"BAD_STATUS"},{"id":199,"attrname":"UDMA_CRC_Error_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of CRC errors during UDMA mode","prefailure":false,"assessment":"BAD_STATUS"},{"id":200,"attrname":"Multi_Zone_Error_Rate","flags":"---R--","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of errors found when writing a sector","prefailure":false,"assessment":"BAD_STATUS"}],"error":null}
      46. SIGCHLD received ...
      47. Child (pid=3865) terminated with exit code 0
      48. Child process forked (pid=3875)
      49. Executing RPC (service=Config, method=isDirty, params=null, context={"username":"admin","role":1}) ...
      50. RPC response (service=Config, method=isDirty): {"response":false,"error":null}
      51. SIGCHLD received ...
      52. Child (pid=3875) terminated with exit code 0
      53. ^CSIGINT received ...
      54. root@server:~# monit start omv-engined
      55. root@server:~# smartctl -x /dev/sda | grep "Error "
      56. Error logging capability: (0x01) Error logging supported.
      57. SCT Error Recovery Control supported.
      58. 0x10 GPL R/O 1 SATA NCQ Queued Error log
      59. SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
      60. Device Error Count: 18
      61. ER = Error register
      62. Error 18 [17] occurred at disk power-on lifetime: 59 hours (2 days + 11 hours)
      63. [...]
      64. root@server:~#
      Display All
      Had to shorten some lines [...] to fit into this post.

      (However no entries in Syslog in this time)
    • Mr Smile wrote:

      no entries in Syslog

      Yes, @votdev substantiated that this time output will go to stdout since the daemon had to be started in the foreground. Essentially all that has been called is directly after another this:

      Source Code

      1. export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x '/dev/sda' 2>&1
      Can you reproduce increased error count when executing this from a shell?
    • tkaiser wrote:

      Mr Smile wrote:

      no entries in Syslog
      Yes, @votdev substantiated that this time output will go to stdout since the daemon had to be started in the foreground. Essentially all that has been called is directly after another this:

      Source Code

      1. export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x '/dev/sda' 2>&1
      Can you reproduce increased error count when executing this from a shell?
      No, error count stays the same when executing this.
      To verify I clicked on Info after that and tadaa ... error count jumped one up ... reproducable! :thumbdown:

      edit: at least I saw that there are no weird parameters of smartctl responsible for this.

      The post was edited 1 time, last by Mr Smile ().

    • another example for clicking "Information" button (error count jumps from 26 to 27):

      Source Code

      1. root@server:~# smartctl -x /dev/sda | grep "Device Error"
      2. Device Error Count: 26 (device log contains only the most recent 24 errors)
      3. root@server:~# omv-engined -d -f
      4. DEBUG [Tue, 28 Aug 18 23:04:07 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; findmnt -f -n -o SOURCE / 2>&1'
      5. Registered data models:
      6. [...]
      7. Registered filesystem backends:
      8. btrfs
      9. exfat
      10. ext
      11. ext2
      12. ext3
      13. ext4
      14. fuseblk
      15. hfsplus
      16. iso9660
      17. jfs
      18. msdos
      19. none
      20. ntfs
      21. reiserfs
      22. udf
      23. ufs
      24. umsdos
      25. vfat
      26. xfs
      27. Registered RPC services:
      28. apt
      29. backup
      30. certificatemgmt
      31. clamav
      32. config
      33. cron
      34. diskmgmt
      35. docker
      36. downloader
      37. emailnotification
      38. exec
      39. filesystemmgmt
      40. folderbrowser
      41. fstab
      42. ftp
      43. iptables
      44. locate
      45. logfile
      46. network
      47. nfs
      48. notification
      49. nut
      50. omvextras
      51. perfstats
      52. permissionsinfo
      53. plugin
      54. powermgmt
      55. quota
      56. raidmgmt
      57. resetperms
      58. rrd
      59. rsnapshot
      60. rsync
      61. rsyncd
      62. services
      63. sharemgmt
      64. smart
      65. smb
      66. ssh
      67. symlinks
      68. syncthing
      69. syslog
      70. system
      71. treefolderbrowser
      72. usbbackup
      73. usermgmt
      74. virtualbox
      75. webgui
      76. zeroconf
      77. Registered modules:
      78. acpid
      79. apt
      80. certificatemgmt
      81. clamav
      82. collectd
      83. cpufrequtils
      84. cron
      85. cronapt
      86. docker
      87. email
      88. fstab
      89. ftp
      90. halt
      91. hdparm
      92. hostname
      93. hosts
      94. interfaces
      95. iptables
      96. issue
      97. mdadm
      98. monit
      99. networking
      100. nfs
      101. ntp
      102. nut
      103. pam
      104. phpfpm
      105. profile
      106. quota
      107. rrdcached
      108. rsnapshot
      109. rsync
      110. rsyncd
      111. samba
      112. sharedfolders
      113. smartmontools
      114. ssh
      115. sysctl
      116. syslog
      117. systemd
      118. timezone
      119. usbbackup
      120. virtualbox
      121. webadmin
      122. webserver
      123. zeroconf
      124. SIGCHLD received ...
      125. Child process forked (pid=17443)
      126. Executing RPC (service=Config, method=isDirty, params=null, context={"username":"admin","role":1}) ...
      127. RPC response (service=Config, method=isDirty): {"response":false,"error":null}
      128. SIGCHLD received ...
      129. Child (pid=17443) terminated with exit code 0
      130. Child process forked (pid=17444)
      131. Executing RPC (service=Smart, method=getInformation, params={"devicefile":"\/dev\/sda"}, context={"username":"admin","role":1}) ...
      132. DEBUG [Tue, 28 Aug 18 23:04:10 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1'
      133. DEBUG [Tue, 28 Aug 18 23:04:10 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; smartctl -x '/dev/sda' 2>&1'
      134. Child process forked (pid=17451)
      135. Executing RPC (service=Smart, method=getAttributes, params={"devicefile":"\/dev\/sda"}, context={"username":"admin","role":1}) ...
      136. DEBUG [Tue, 28 Aug 18 23:04:10 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1'
      137. DEBUG [Tue, 28 Aug 18 23:04:10 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; smartctl -x '/dev/sda' 2>&1'
      138. Child process forked (pid=17456)
      139. Executing RPC (service=Smart, method=getSelfTestLogs, params={"devicefile":"\/dev\/sda"}, context={"username":"admin","role":1}) ...
      140. DEBUG [Tue, 28 Aug 18 23:04:10 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1'
      141. DEBUG [Tue, 28 Aug 18 23:04:10 +0200 File=process.inc Line=172 Method=OMV\System\Process::execute()]: Executing command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; smartctl -x '/dev/sda' 2>&1'
      142. RPC response (service=Smart, method=getInformation): {"response":{"devicemodel":"WDC WD40EFRX-68N32N0","serialnumber":"WD-WCC7K1EKCYT3","firmwareversion":"82.00A82","usercapacity":"4,000,787,030,016 bytes [4.00 TB]","modelfamily":"Western Digital Red","luwwndeviceid":"5 0014ee 20fda8dd0","sectorsizes":"512 bytes logical, 4096 bytes physical","rotationrate":"5400 rpm","formfactor":"3.5 inches","deviceis":"In smartctl database [for details use: -P show]","ataversionis":"ACS-3 T13\/2161-D revision 5","sataversionis":"SATA 3.1, 6.0 Gb\/s (current: 6.0 Gb\/s)","localtimeis":"Tue Aug 28 23:04:10 2018 CEST","smartsupportis":"Enabled","aamfeatureis":"Unavailable","apmfeatureis":"Unavailable","rdlook-aheadis":"Enabled","writecacheis":"Enabled","atasecurityis":"Disabled, frozen [SEC2]","wtcachereorder":"Enabled"},"error":null}
      143. SIGCHLD received ...
      144. Child (pid=17444) terminated with exit code 0
      145. RPC response (service=Smart, method=getSelfTestLogs): {"response":[{"num":"1","description":"Extended offline","status":"Completed without error ","remaining":"00","lifetime":"27","lbaoffirsterror":"-"}],"error":null}
      146. SIGCHLD received ...
      147. Child (pid=17456) terminated with exit code 0
      148. RPC response (service=Smart, method=getAttributes): {"response":[{"id":1,"attrname":"Raw_Read_Error_Rate","flags":"POSR-K","value":200,"worst":200,"treshold":51,"whenfailed":"-","rawvalue":"0","description":"Frequency of errors while reading raw data from the disk","prefailure":true,"assessment":"GOOD"},{"id":3,"attrname":"Spin_Up_Time","flags":"POS--K","value":100,"worst":253,"treshold":21,"whenfailed":"-","rawvalue":"0","description":"Time needed to spin up the disk","prefailure":true,"assessment":"GOOD"},{"id":4,"attrname":"Start_Stop_Count","flags":"-O--CK","value":100,"worst":100,"treshold":0,"whenfailed":"-","rawvalue":"3","description":"Number of spindle start\/stop cycles","prefailure":false,"assessment":"BAD_STATUS"},{"id":5,"attrname":"Reallocated_Sector_Ct","flags":"PO--CK","value":200,"worst":200,"treshold":140,"whenfailed":"-","rawvalue":"0","description":"Number of remapped sectors","prefailure":true,"assessment":"GOOD"},{"id":7,"attrname":"Seek_Error_Rate","flags":"-OSR-K","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Frequency of errors while positioning","prefailure":false,"assessment":"BAD_STATUS"},{"id":9,"attrname":"Power_On_Hours","flags":"-O--CK","value":100,"worst":100,"treshold":0,"whenfailed":"-","rawvalue":"65","description":"Number of hours elapsed in the power-on state","prefailure":false,"assessment":"BAD_STATUS"},{"id":10,"attrname":"Spin_Retry_Count","flags":"-O--CK","value":100,"worst":253,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of retry attempts to spin up","prefailure":false,"assessment":"BAD_STATUS"},{"id":11,"attrname":"Calibration_Retry_Count","flags":"-O--CK","value":100,"worst":253,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of attempts to calibrate the device","prefailure":false,"assessment":"BAD_STATUS"},{"id":12,"attrname":"Power_Cycle_Count","flags":"-O--CK","value":100,"worst":100,"treshold":0,"whenfailed":"-","rawvalue":"3","description":"Number of power-on events","prefailure":false,"assessment":"BAD_STATUS"},{"id":192,"attrname":"Power-Off_Retract_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of power-off or emergency retract cycles","prefailure":false,"assessment":"BAD_STATUS"},{"id":193,"attrname":"Load_Cycle_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"17","description":"Number of cycles into landing zone position","prefailure":false,"assessment":"BAD_STATUS"},{"id":194,"attrname":"Temperature_Celsius","flags":"-O---K","value":113,"worst":109,"treshold":0,"whenfailed":"-","rawvalue":"37","description":"Current internal temperature of the drive","prefailure":false,"assessment":"BAD_STATUS"},{"id":196,"attrname":"Reallocated_Event_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of remapping operations","prefailure":false,"assessment":"BAD_STATUS"},{"id":197,"attrname":"Current_Pending_Sector","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of sectors waiting to be remapped","prefailure":false,"assessment":"BAD_STATUS"},{"id":198,"attrname":"Offline_Uncorrectable","flags":"----CK","value":100,"worst":253,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"The total number of uncorrectable errors when reading\/writing a sector","prefailure":false,"assessment":"BAD_STATUS"},{"id":199,"attrname":"UDMA_CRC_Error_Count","flags":"-O--CK","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of CRC errors during UDMA mode","prefailure":false,"assessment":"BAD_STATUS"},{"id":200,"attrname":"Multi_Zone_Error_Rate","flags":"---R--","value":200,"worst":200,"treshold":0,"whenfailed":"-","rawvalue":"0","description":"Number of errors found when writing a sector","prefailure":false,"assessment":"BAD_STATUS"}],"error":null}
      149. SIGCHLD received ...
      150. Child (pid=17451) terminated with exit code 0
      151. Child process forked (pid=17462)
      152. Executing RPC (service=Config, method=isDirty, params=null, context={"username":"admin","role":1}) ...
      153. RPC response (service=Config, method=isDirty): {"response":false,"error":null}
      154. SIGCHLD received ...
      155. Child (pid=17462) terminated with exit code 0
      156. ^CSIGINT received ...
      157. root@server:~# smartctl -x /dev/sda | grep "Device Error"
      158. Device Error Count: 27 (device log contains only the most recent 24 errors)
      Display All
      any ideas?

      btw: Thanks for your support! :)

      The post was edited 4 times, last by Mr Smile ().

    • It seems smartctl behaves different when it is executed via PHP and your HD’s firmware does not like this. Due the fact that this behavior does not happen on any hardware this is really strange to identify. I think this can not be fixed by the OMV project finally because smartmontools and PHP are not maintained by the project nor can we fix firmware issues.

      Finally you have to life with this behavior by do not use the SMART UI or switch the hdd devices.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • The four new HDDs were quite expensive and I cannot return them anymore, so switching hdds is not an option.

      It bugs me that there should be no way to reproduce the error aside from omv smart gui. There must be something "special" going on right before or after the smartctl call. You say that it could be something between php and smartctl.

      One more idea:
      In the console output above it seems that smartctl and udevadm are executed several times in one second. Is this observation right? If yes, why? If this happens the ATA error could probably be a timing problem due to concurring attemps to read smart values ...

      I'd really like to open issues in the right bugtrackers but sadly I don't have the php skills to program a minimal example for triggering that issue. Could someone please copy paste a "working" minimal example php file that just executes those problematic code that is executed by clicking the "Information"-Button? (or even better a bash command/script that triggers the error)
      Or provide me a improved/patched version of smart gui to test. I'm willing to help.

      I think it is important to follow those errors because there are so many users with WD drives affected.

      Examples:
      forum.openmediavault.org/index…creasing-ATA-error-count/ (OMV, WD Red)
      forum.openmediavault.org/index…a-SI-PEX40064-SATA3-Card/ (OMV, WD Red - mentioned in other thread)
      debianforum.de/forum/viewtopic.php?t=153705#p1025789 (Debian Wheezy but highly likely OMV as the username also posted in omv forum around that time, WD Red)
      hardwareluxx.de/community/f15/…1084178.html#post23773315 (OMV - see hostname, WD Red)
      hardwareluxx.de/community/f211…1043116.html#post22821883 (OMV, HDD unknown)
      technikaffe.de/forum/index.php…t-error-errorcount-hilfe/ (OMV, WD Red)
      technikaffe.de/forum/index.php…-m-a-r-t-meldung-was-nun/ (OMV, WD Red, also other errors but ATA errors increase too)
      forum.ubuntuusers.de/topic/verstehe-s-m-a-r-t-werte-nicht/ (Ubuntu forum but OMV machine affected, WD Red, user asked for other problem but posted smartctl output revealed the ATA error problem by the way)
      ...

      I could list up many more. And if you google for this exact Error, most forum threads you find reveal sooner or later that the affected person is using OMV or OMV packet on top of Debian. I agree that the core for this problem might be that WD firmware is a bit picky regarding ATA communication but OMV is the software that somehow triggers this bug. So investigation should start here.

      The post was edited 14 times, last by Mr Smile ().

    • Mr Smile wrote:

      In the console output above it seems that smartctl and udevadm are executed several times in one second
      If you suspect that's the problem simply simulate this in a bash shell:

      Source Code

      1. for in 1 2 3 ; do
      2. (export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x '/dev/sda' 2>&1) &
      3. done
      Just curious: Why do you still care about this WD firmware behaviour?
    • tkaiser wrote:

      Mr Smile wrote:

      In the console output above it seems that smartctl and udevadm are executed several times in one second
      If you suspect that's the problem simply simulate this in a bash shell:

      Source Code

      1. for in 1 2 3 ; do
      2. (export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x '/dev/sda' 2>&1) &
      3. done
      Just curious: Why do you still care about this WD firmware behaviour?
      @tkaiser: I had to put an i behind for but now it works! 8o ...
      This code hangs somewhere and has to be skipped by CTRL+C. And after that the ATA error count of sda is increased by one!

      Source Code

      1. for i in 1 2 3 ; do
      2. (export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x '/dev/sda' 2>&1) &
      3. done
      I've tested it several times and every execution increased the count. Even for i in 1 2 ; does the trick. But for i in 1 ; does NOT increase the counter - as expected. I'm convinced, we found the problem. :thumbup:

      @votdev: Would you please have a look at the omv code why this is executed more than one time?

      @tkaiser: I care because when this error occured I was worried about the health of my new drives. And in all the threads I found on internet there were given wrong advice to the people (swaping cables, even dumping the drives or mainboards(!) because there must be some kind of loose connection and so on) but no one saw the coherence between increasing ATA error count and clicking on OMV SMART Information Button. I (and all the other users) lost so much time because of that... :thumbdown:

      I want to use my disks to store irreplaceable family memories on them. So data integrity is very important to me. (Yes, I have Backups but better safe than sorry!) And i also like OMV for its super easy and good working SMART functionality. I bought those new drives because last week OMV sent me a SMART warning mail about one of my old drives slowly dying. And so I have been able to safe all my data before fatal drive failure. So I want to have SMART mails active. But with my new drives I got SMART warning mails virtually all day and I had no idea why. But I think you agree that simply switching off SMART warnings is not so clever.

      So please help me to sort this thing out (for me and for all the other affected users)! :)

      The post was edited 2 times, last by Mr Smile ().

    • Ok, yesterday evening i looked into this file to understand whats going on: github.com/openmediavault/open…rage/smartinformation.inc (Please excuse my very basic approach. I have absolutely no idea how php/js works.)

      It seems that when you click on "Information" those three functions for the first three window tabs are executed and all three call getData() which itself executes smartctl:


      getInformation()
      getAttributes()
      getSelfTestLogs()


      I didn't find the place where this is actually executed. -> I found this but its all greek to me: github.com/openmediavault/open…/storage/smart/Devices.js)

      My assumption is based on the fact that the Information, Attributes and Selftest Logs tabs are instantly there when you click on them, but the Extended Information tab needs a bit of time to come up. This explains the three concurrent executions of smartctl.

      @votdev: It should be easy for you to fix this Bug (or lets call it incompatibility with WD drives) by adding some small pause between the executions. The user would not even realize as the SMART Information Window starts on Information tab. Or maybe change the tab behavior from "preloading" to "loading just when the user clicks" - same as for the Extended Information tab. Or use one call of getData() instead of three to fill all three tabs at once.

      Thanks for your help!

      Mr Smile

      The post was edited 7 times, last by Mr Smile ().

    • Mr Smile wrote:

      It should be easy for you to fix this Bug
      This is no bug, otherwise EVERY HDD would react like the WD firmware. The firmware is buggy in this case.

      Mr Smile wrote:

      adding some small pause between the executions
      How long should this be? 0.1, 0.5 seconds or more?
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • Mr Smile wrote:

      Hard to say. 0.5 seconds?
      You have those buggy WD drives at hand. So you would need to test:

      Source Code

      1. for i in 1 2 3 ; do
      2. (export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x '/dev/sda' 2>&1) &
      3. sleep 0.5
      4. done
      Will you open up a bug report at WD too?
    • tkaiser wrote:

      Mr Smile wrote:

      It should be easy for you to fix this Bug
      It's impossible to fix a HDD firmware bug somewhere else. Please reach out to WD and let them fix their bug. Remember first page of this thread where you can read how they violate specs?
      I'll do this later, but how long do you think will it take to get new firmwares for all those WD Red drives? I don't think WD ever releases one, as this call behavior of smartctl is so uncommon. Most affected users (in fact all I found on the internet) seem to have OMV running on their machines. I absolutely agree that WD is to blame for their Firmware (and I'll report this behavior at smartmontools and WD) but for now lets try to make OMV behavior "more common".

      And to be honest: I don't understand why smartctl has to be executed three times at the same time ... :)
    • Mr Smile wrote:

      I don't understand why smartctl has to be executed three times at the same time ...
      It is needed because they are executed async in different threads.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • votdev wrote:

      Mr Smile wrote:

      I don't understand why smartctl has to be executed three times at the same time ...
      It is needed because they are executed async in different threads.
      ah ok, this explains why in one of ten cases or so the ATA error count stays the same.

      Ok, just to understand: Why is it needed to process three tabs (of which only one can be visible at the same time) concurrently with multithreading? Why is that done?

      Using three concurrent threads to ask a harddrive for the same SMART informations seems to me like a intentional firmware race condition test. :D
    • Mr Smile wrote:

      I don't think WD ever releases one, as this call behavior of smartctl is so uncommon
      They violate the specs. Specifications exist for a reason.

      My personal 'solution' for the various issues with WD drives is to not buy them and educate people to do the same. Maybe we should emphasize more on this since most probably you're absolutely right and they give a sh*t about being specs compliant and won't fix the issue in their firmware.
    • Your WD refusal is not very helpful. I've spent 500 bucks for the drives and when I realized this problem I just found other affected OMV users but no solution. In other words you say: Dump OMV and use one of the other operating systems that doesn't trigger this WD bug.

      The reality is that a lot of OMV users have WD Red drives because they are cheap, quiet, consume little energy, are recommended for NAS/Raid and so on. But also a lot of them don't give a shit on email-notifications. So they don't know that they are affected by this problem.

      You gave me 414 sites of specifications. But I'm not a programmer and this is all greek to me. Would you please point me to the violated paragraph so that I can file a bug at wd and/or smartmontools?

      edit: Damn I rebooted the server this morning and now even the triple loop doesn't trigger the firmware bug constantly ... maybe 1 of 7 times. :-/

      edit2: This also goes for the Information button. Will reboot now ... :S

      The post was edited 3 times, last by Mr Smile ().