Why is my HDD waking up?

    • OMV 3.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Why is my HDD waking up?

      Hi!

      In a previous post , I was trying to understand why my FS was regurlary corrupted. It was said that I could be caused by an unsupported sleep mode, but I didn't look further.

      Now, I noticed another issue : the hard-drive correctly go to sleep mode after a 10 minutes or by issuing the command 'hdparm -y /dev/sda', but it awakes by itself, without any reason (no access to the hard drive at that time).
      I couldn't find the reason of that... It seems to occur event if the system is totally disconnected from the network.

      For information, I'm running OMV3 (yeah, I know, it's beta, but I want to run on Debian 8 ) on an Odroid XU4 installed with DietPi (based on Debian 8). I'm not sure that this issue is directly linked to OpenMediaVault, but I think that there are a lot of experts out there that could help me. The OS is running from an SD card, and the hard-drive is a Lacie external hard drive connected on USB 2.

      I've read the Guide to debugging disks spin up with no success : hdparm -C /dev/sda always return the state "standby" (even if it is actually running), so, this script cannot detect the drive spinning up.
      I've monitoried dmesg when the drive restarted, but I didn't see any messages about sda, only READ/WRITE blocks on mmcblk0p2.

      In the man page of hdparm, I found the param '-Z', that is supposed to disable automatic power saving modes for some Seagate hard-drive. The reference "ST3xxx models" is mentionned. When I saw that, I thought it should solve the issue, as my hard drive is a Seagate ST3500820AS. I ran hdparm -Z /dev/sda but it still spins up automatically.

      Here are some more information about the hard drive:

      Source Code

      1. root@odroid:~# smartctl -a /dev/sda
      2. smartctl 6.4 2014-10-07 r4002 [armv7l-linux-3.10.96+] (local build)
      3. Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
      4. === START OF INFORMATION SECTION ===
      5. Model Family: Seagate Barracuda 7200.11
      6. Device Model: ST3500820AS
      7. Serial Number: 9QM66CHV
      8. LU WWN Device Id: 5 000c50 00db45e97
      9. Firmware Version: LC11
      10. User Capacity: 500,107,862,016 bytes [500 GB]
      11. Sector Size: 512 bytes logical/physical
      12. Rotation Rate: 7200 rpm
      13. Device is: In smartctl database [for details use: -P show]
      14. ATA Version is: ATA8-ACS T13/1699-D revision 4
      15. SATA Version is: SATA 2.6, 3.0 Gb/s
      16. Local Time is: Sat Jun 25 14:23:04 2016 CEST
      17. ==> WARNING: There are known problems with these drives,
      18. see the following Seagate web pages:
      19. http://knowledge.seagate.com/articles/en_US/FAQ/207931en
      20. http://knowledge.seagate.com/articles/en_US/FAQ/207951en
      21. http://knowledge.seagate.com/articles/en_US/FAQ/207957en
      22. SMART support is: Available - device has SMART capability.
      23. SMART support is: Enabled
      24. === START OF READ SMART DATA SECTION ===
      25. SMART overall-health self-assessment test result: PASSED
      26. General SMART Values:
      27. Offline data collection status: (0x82) Offline data collection activity
      28. was completed without error.
      29. Auto Offline Data Collection: Enabled.
      30. Self-test execution status: ( 0) The previous self-test routine completed
      31. without error or no self-test has ever
      32. been run.
      33. Total time to complete Offline
      34. data collection: ( 625) seconds.
      35. Offline data collection
      36. capabilities: (0x7b) SMART execute Offline immediate.
      37. Auto Offline data collection on/off support.
      38. Suspend Offline collection upon new
      39. command.
      40. Offline surface scan supported.
      41. Self-test supported.
      42. Conveyance Self-test supported.
      43. Selective Self-test supported.
      44. SMART capabilities: (0x0003) Saves SMART data before entering
      45. power-saving mode.
      46. Supports SMART auto save timer.
      47. Error logging capability: (0x01) Error logging supported.
      48. General Purpose Logging supported.
      49. Short self-test routine
      50. recommended polling time: ( 1) minutes.
      51. Extended self-test routine
      52. recommended polling time: ( 113) minutes.
      53. Conveyance self-test routine
      54. recommended polling time: ( 2) minutes.
      55. SCT capabilities: (0x103b) SCT Status supported.
      56. SCT Error Recovery Control supported.
      57. SCT Feature Control supported.
      58. SCT Data Table supported.
      59. SMART Attributes Data Structure revision number: 10
      60. Vendor Specific SMART Attributes with Thresholds:
      61. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      62. 1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 86728865
      63. 3 Spin_Up_Time 0x0003 090 087 000 Pre-fail Always - 0
      64. 4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1789
      65. 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
      66. 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 4309725819
      67. 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12348
      68. 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 23
      69. 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 418
      70. 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
      71. 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
      72. 188 Command_Timeout 0x0032 098 098 000 Old_age Always - 8590065667
      73. 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
      74. 190 Airflow_Temperature_Cel 0x0022 064 046 045 Old_age Always - 36 (Min/Max 33/42)
      75. 194 Temperature_Celsius 0x0022 036 054 000 Old_age Always - 36 (0 8 0 0 0)
      76. 195 Hardware_ECC_Recovered 0x001a 043 028 000 Old_age Always - 86728865
      77. 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
      78. 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
      79. 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
      80. SMART Error Log Version: 1
      81. No Errors Logged
      82. SMART Self-test log structure revision number 1
      83. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
      84. # 1 Short offline Completed without error 00% 11119 -
      85. SMART Selective self-test log data structure revision number 1
      86. SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
      87. 1 0 0 Not_testing
      88. 2 0 0 Not_testing
      89. 3 0 0 Not_testing
      90. 4 0 0 Not_testing
      91. 5 0 0 Not_testing
      92. Selective self-test flags (0x0):
      93. After scanning selected spans, do NOT read-scan remainder of disk.
      94. If Selective self-test is pending on power-up, resume after 0 minute delay.
      Display All


      I don't know how to diagnose further this issue. I'm looking for new ideas :)

      Thanks!
    • Hope you have the flashmemory plugin installed.

      It was installed on my previous installed (based on a ready made image but running Debian7). This installed on Dietpi is rather new, and I haven't installed it yet. But I think that DietPi already has some mecanism to reduce wear leveling.

      Here is my FSTAB, by the way:

      Source Code

      1. proc /proc proc defaults 0 0
      2. /dev/mmcblk0p1 /boot vfat defaults,noatime,discard 0 2
      3. /dev/mmcblk0p2 / ext4 defaults,noatime,discard 0 1
      4. tmpfs /tmp tmpfs defaults,noatime,nodev,nosuid,mode=1777 0 0
      5. tmpfs /var/log tmpfs defaults,size=20m,noatime,nodev,nosuid,mode=1777 0 0
      6. tmpfs /DietPi tmpfs defaults,size=10m,noatime,nodev,nosuid,mode=1777 0 0
      7. # >>> [openmediavault]
      8. UUID=36bc1b34-a3a9-4f53-ab75-c7d449e5f5e4 /media/36bc1b34-a3a9-4f53-ab75-c7d449e5f5e4 ext4 defaults,nofail,user_xattr,noexec,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl 0 2
      9. # <<< [openmediavault]


      Do you smart tests enabled? That could be waking the drive up.

      SMART tests are disabled in the administration panel of OMV. Is there any way to check that in the command line to be sure?

      Can I check if there is any scheduled job running of the system? It seems to spin up every 15 or 30 minutes...
    • Here are more informations : I was monitoring the file /var/log/daemon.log while the disk restarted. Here are the lines that appears just at that time :

      Source Code

      1. Jun 25 15:44:18 Odroid rrdcached[1229]: flushing old values
      2. Jun 25 15:44:18 Odroid rrdcached[1229]: rotating journals
      3. Jun 25 15:44:18 Odroid rrdcached[1229]: started new journal /var/lib/rrdcached/journal/rrd.journal.1466862258.461152
      4. Jun 25 15:44:18 Odroid rrdcached[1229]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1466855058.461991
      5. Jun 25 15:45:01 Odroid rrdcached[1229]: Received FLUSHALL


      Could this 'rrdcached' need to read/write to the external hard drive?
    • You have /var/log in tmpfs but you really need /var/lib/rrdcached and /var/lib/monit in tmpfs.

      rrdcached is the statistics/graphs in OMV. You can disable them to see. They shouldn't be causing the drives to spin up since their files live in /var/lib/.

      I would look at cron jobs as well.
      omv 4.0.11 arrakis | 64 bit | 4.13 backports kernel | omvextrasorg 4.1.0
      omv-extras.org plugins source code and issue tracker - github.com/OpenMediaVault-Plugin-Developers

      Please don't PM for support... Too many PMs!
    • ryecoaaron wrote:

      You have /var/log in tmpfs but you really need /var/lib/rrdcached and /var/lib/monit in tmpfs.

      rrdcached is the statistics/graphs in OMV. You can disable them to see. They shouldn't be causing the drives to spin up since their files live in /var/lib/.

      I would look at cron jobs as well.


      Thanks for the tip, I'll add these directories in tmpfs.

      I disabled monitoring in the web interface, without any result on my issue...

      I analyzed all files in /etc/cron.d.
      The only entries that are executed regularly are
      • openmediavault-rrdtoolgraph, run every 15 minutes
      • php5, run at h09 and h39 (every 30 minutes)
      I moved these files away from cron to check if they cause the issue.
      Then, there are all the files in cron.daily, cron.weekly,... but I assume they are launched only once a day, week...

      Is there any way to log the HDD spinup/down in order to debug this easily?

      Thanks!
    • ryecoaaron wrote:

      Don't know. I don't even spin my drives down.


      In my case, the HD is idle most of the time... it doesn't need to run all day when I don't need it.

      I did several tests with inotifywait on various location (/dev/sda, /dev/sda1, /media/mountpoint ,...). It detects an event on /dev/sda just when the HD spins up, but not on other locations. Here is the output :

      Source Code

      1. root@odroid:~# inotifywait -m /dev/sda
      2. Setting up watches.
      3. Watches established.
      4. /dev/sda OPEN
      5. /dev/sda CLOSE_NOWRITE,CLOSE


      So, it seems that something reads "something" on /dev/sda. But I assume that it is lower level than the filesystem (which is EXT4, btw).
      Any idea on what could access the device directly every 1/2h ?

      Thanks!
    • Try watching the mountpoint (/media/REPLACE_WITH_UUID) instead the drive itself. I think it should show you the file/folder being accessed then.
      omv 4.0.11 arrakis | 64 bit | 4.13 backports kernel | omvextrasorg 4.1.0
      omv-extras.org plugins source code and issue tracker - github.com/OpenMediaVault-Plugin-Developers

      Please don't PM for support... Too many PMs!
    • Unfortunately, watching the mountpoint shows nothin:

      Source Code

      1. root@odroid:~# inotifywait -m /media/36bc1b34-a3a9-4f53-ab75-c7d449e5f5e4
      2. Setting up watches.
      3. Watches established.


      But, inotifywait launched at the same time of the previous one shows this, just at the time the drive wakes up:

      Source Code

      1. root@odroid:~# inotifywait -m /dev/sda
      2. Setting up watches.
      3. Watches established.
      4. /dev/sda OPEN
      5. /dev/sda CLOSE_NOWRITE,CLOSE


      But it doesn't show any clue about the file or folder that is accessed :/

      I'll see if lsof can be of any help...

      EDIT : no, lsod doesn't see anything...

      The post was edited 1 time, last by JF002 ().

    • I don't think a file or folder is being accessed. Something seems to be just querying the drive. Not sure what that would be.
      omv 4.0.11 arrakis | 64 bit | 4.13 backports kernel | omvextrasorg 4.1.0
      omv-extras.org plugins source code and issue tracker - github.com/OpenMediaVault-Plugin-Developers

      Please don't PM for support... Too many PMs!
    • Well, I was digging a little bit deeper into the system and disabled temporarily the services collectd and monit. But this wasn't the clue.
      Then I read the great how to including test script in the following thread:
      My Guide to Debugging Disk Spin-ups

      So I have found, that "parted" was checking my devices over and over again. Unfortunately I was not able to discover which process was triggering parted. I've simply disabled the system tool via renaming it inside /sbin.

      I'm still curious, what's behind this process. So if anybody has an idea I'd be glad to read it here..
      omv 3.0.31 erasmus | 64 bit | 4.6 backport kernel | omvextrasorg 3.3.7
    • You renamed parted? That isn't the direction I would go. parted is not a service. Something is calling it and it isn't being called on my systems. I can't find any calls to parted in the OMV source code.
      omv 4.0.11 arrakis | 64 bit | 4.13 backports kernel | omvextrasorg 4.1.0
      omv-extras.org plugins source code and issue tracker - github.com/OpenMediaVault-Plugin-Developers

      Please don't PM for support... Too many PMs!
    • I totally agrre that's renaming parted is not the solution. But for a test if the hard drives stay spinned down, it was a good idea. Next thing is to find out, what process is calling parted over and over. And it is good to know, that it isn't a part of omv.
      When I've found out what is going on, I'll write it here.
      omv 3.0.31 erasmus | 64 bit | 4.6 backport kernel | omvextrasorg 3.3.7
    • Meanwhile I was abble to find out, which process has started parted every 5 minutes. It is an internal cron-job of webmin. :( I've installed it a year ago. I dont't know if my problem with the hard drives spinning up was caused by an update of webmin. Anyway. If you go to the webmin configuration and there to the entry "Hintergrund-Status Sammlung" / background status collection, then you will find the reason for all the stuff: "Sammelt Festplattentemperaturen?" / Collecting hard drive temperatures. Changing this to "no" and the drives keep in power safe mode.

      How did I found out, what process causes my troubles? I changend /sbin/parted to a little script:
      #!/bin/sh
      ps -f $PPID >> /var/log/partedcall.txt
      sleep 300

      The sleep command was useful to keep the process alive which was calling parted. With the parent process id I was able to get the calling process name:
      ps -p <ppid from partedcall.txt> -o comm=

      And here ends my story - finally. :)
      omv 3.0.31 erasmus | 64 bit | 4.6 backport kernel | omvextrasorg 3.3.7
    • Good to know. I don't install webmin on OMV boxes because it seems redundant in my opinion. Glad you found the issue. I'm sure it will help some of the other webmin users.
      omv 4.0.11 arrakis | 64 bit | 4.13 backports kernel | omvextrasorg 4.1.0
      omv-extras.org plugins source code and issue tracker - github.com/OpenMediaVault-Plugin-Developers

      Please don't PM for support... Too many PMs!