Odroid HC1 - HDD-spinup at shutdown incl. nasty sounds

    • OMV 4.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Odroid HC1 - HDD-spinup at shutdown incl. nasty sounds

      Hi,

      as mentioned, i have upgraded my RPi-OMV to an Odroid HC1-system including a Seagate-Barracuda attached to the SATA-bridge.

      While setting up the new system i have to shutdown and reboot it some times.
      Even if the hdd sleeps, when i hit the reboot-(software-)button, the disk wakes up.
      Furthermore, when the board shuts down completely the disk screeches very nasty :cursing:
      (i think without electricity the disk stops immediately!?)

      Another disk behaves the same way.
      This cannot be that good for long lifetime, or? ;(

      Can someone test, if this is a default behavior?


      Thank you in advance,
      regards
      Minfred
    • Without power, the read heads retract immediately. Since the heads float on a very thin cushion of air, generated by the spin of the disk, if they didn't retract before the rotation slows, they'd land on the disk itself and crash - literally. (The same can happened if a disk is jarred hard enough, while it's spinning.)

      If power is suddenly removed: The retracting heads might sound like a "click" and you might hear a "whirring" sound as the platters spin down.

      If you're using a 3.5 drive in something similar to a drive dock, you might hear these sounds and since a drive that's designed to be internal is outside of a case, the sounds may be "pronounced". But a "screech" (without putting a decibel level to it :) ) and truly loud noises are not good.

      If you're worried about imminent failure:
      Go to Storage, SMART, the Devices tab, select a drive, click the Information button, and select the Extended Information tab.

      In Extended information, compare what you find under SMART Attributes to the following:
      (The following SMART attributes are the most accurate in predicting failure.)

      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.

      You want 0 (zero) raw counts for the above. Ignore the rest of the attributes if there's an extended list and, if one or two of above attributes are not present, don't worry about them.

      If any of the above are incrementing upward, a failure is in the making. It would be time to backup and replace the drive. If they're at 0, your drive should be fine.

      Video Guides :!: New User Guide :!: Docker Guides :!: Pi-hole in Docker
      Good backup takes the "drama" out of computing.
      ____________________________________
      Primary: OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      OMV 4.1.13, Intel Server SC5650HCBRP, 32GB ECC, 16GB USB boot, UnionFS+SNAPRAID
      Backup: OMV 4.1.9, Acer RC-111, 4GB, 32GB USB boot, 3TB+3TB zmirror, 4TB Rsync'ed disk
    • flmaxey wrote:

      (The following SMART attributes are the most accurate in predicting failure.)

      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.
      Nope, that's just copy&paste from here: computerworld.com/article/2846…t-hard-drive-failure.html

      The basis for these claims is some data collected by a company using desktop drives in inappropriate environment publishing some 'statistics gone wrong' numbers from time to time as marketing mechanism (Backblaze).

      If the job is done by people who know statistics (correlation is not causation, relying on such numbers makes only sense when you use tens of thousands of disks) then it's as easy as this: 'Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures'

      So please forget about this SMART BS when it's about 'predicting individual drive failure'. Every HDD will die eventually. To cope with this we do backup.
    • tkaiser wrote:

      flmaxey wrote:

      (The following SMART attributes are the most accurate in predicting failure.)

      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.
      Nope, that's just copy&paste from here: computerworld.com/article/2846…t-hard-drive-failure.html
      Nope, it's copy and paste direct from backblaze.com. (That might have been where computerworld.com got their info.)

      With tens of thousands of consumer grade commodity drives in the backblaze study, with several different drive OEM's represented, set in a fixed climate controlled environment, the backblaze study qualifies as being truly "empirical" - not just an anecdotal opinion. I read the entire study, checked over their data and conclusions and I chose to believe their results.

      Googles white paper, from your link makes some interesting points, but it does not directly refute the results of the backblaze.com study. In fact, the paper supports backblaze.com conclusions in many respects but, to make that discovery, one would have to read both studies.

      This is "copied and pasted" direct from the Google paper's conclusions:
      "We find, for example, that after their first scan error, drives are 39 times more likely to fail within 60days than drives with no such errors. First errors in re-allocations, offline reallocations, and probational counts are also strongly correlated to higher failure probabilities."
      (The only really interesting points in the Google paper's conclusions was the lack of evidence that high temperature and/or high drive utilization shortens life.)

      In the bottom line, nothing can be predicted with perfect accuracy. On the other hand, the SMART attributes mentioned above are a very good place to start.

      But, yes, I'll concede that in a "one off" situation, it's possible that a drive could fail outright with no counts on the mentioned SMART attributes. On the other hand, there's a chance a meteor might take out my house tonight. In either case, I'll take my chances.

      Thanks for your input. It was real productive. :)

      Video Guides :!: New User Guide :!: Docker Guides :!: Pi-hole in Docker
      Good backup takes the "drama" out of computing.
      ____________________________________
      Primary: OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      OMV 4.1.13, Intel Server SC5650HCBRP, 32GB ECC, 16GB USB boot, UnionFS+SNAPRAID
      Backup: OMV 4.1.9, Acer RC-111, 4GB, 32GB USB boot, 3TB+3TB zmirror, 4TB Rsync'ed disk

      The post was edited 2 times, last by flmaxey: edits ().

    • Hi,
      i really appreciate that, thank you.

      If this affect all Odroid hc1 (and hc2?) maybe its possible to add the script to the standard omv-iso preinstalled?
      It had cost me some SMART-points before i recognized it at all and read your answer.


      I had a look at the SMART data, i cannot believe what i just see... the disk is brand new from pollin.de, just about 4-5 emergency stops before the "shutdown script":

      ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
      5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
      187 Reported_Uncorrect -O--CK 100 100 000 - 0
      188 Command_Timeout -O--CK 100 100 000 - 0
      197 Current_Pending_Sector -O--C- 100 100 000 - 0
      198 Offline_Uncorrectable ----C- 100 100 000 - 0

      As @flmaxey mentioned, i want the numbers as low as possible.
      But did i read that table wrong or is every value at its worst point and over the thresholds?!

      Can this be true????

      Kind regards,
      Minfred

      The post was edited 2 times, last by Minfred: Adding table and question ().

    • Minfred wrote:

      If this affect all Odroid hc1 (and hc2?) maybe its possible to add the script to the standard omv-iso preinstalled?
      At least I don't want to waste my time compensating for mistakes hardware vendors make. The real fix is a new firmware flashed to the JMS578 and for me it's somewhat surprising that still nothing happened (best idea is to ask Hardkernel over in their forum).

      Anyway, this is only a 'problem' for people constantly shutting down or rebooting their devices. If you use it as NAS (keep it powered on for years) you're obviously not affected.

      Also it's a bit sad that you got distracted by the Backblaze BS. All your SMART values are fine (0), only attributes 5, 187, 197 and 198 would be a real indication of problems that already happened or are just happening (but NO WAY to predict a drive failure from any of those numbers, your drive can fail at any time without any SMART value having increased -- it's pure BS to trust in 'statistics done wrong'. If you use disk storage you need BACKUP)

      And you look at the wrong SMART attribute. You should better check Power-Off_Retract_Count. If you installed the script the value should not increase any more.
    • flmaxey wrote:

      (The only really interesting points in the Google paper's conclusions was the lack of evidence that high temperature and/or high drive utilization shortens life.)

      You should really read and try to understand section '3.5.6 Predictive Power of SMART Parameters'.

      Relying on SMART to predict drive failures is BS. If you have only a single disk statistics are irrelevant anyway (your disk doesn't know about statistics, only if you use tens of thousands of disks you can make use of numbers others generated with same amounts of disks). Even if you would have tens of thousands of disks still SMART is of no help to predict drive failures.

      Things to keep in mind: The Google paper is pretty old (dealing with disks from 15 years ago), those Google disks got 'scrubbed' regularly (which is something the average user never does) and the effect of vibrations isn't considered (since there are no SMART attributes taking care of this -- there exists also a 'nice' disk failure study from some Microsoft folks who happily ignore this and do some impressive statistics failures).

      At work we regularly pull disks out of the ZFS JBODs, put them into a test system where they are benchmarked (sequential and especially random IO, tested over the whole disk surface and another time over 10 partitions. Results are stored in a database together with the disk's WWN/WWID and if random IO performance especially over the whole disk drops, these disks will never be reattached to hot storage but are used in archive arrays from then on).

      This is just an assumption (if positioning and head movements take more time then there's something wrong) but no evidence since we would need to watch a few thousand disks die until we get any useful numbers.
    • tkaiser wrote:

      At least I don't want to waste my time compensating for mistakes hardware vendors make. The real fix is a new firmware flashed to the JMS578 and for me it's somewhat surprising that still nothing happened (best idea is to ask Hardkernel over in their forum).
      Anyway, this is only a 'problem' for people constantly shutting down or rebooting their devices. If you use it as NAS (keep it powered on for years) you're obviously not affected.

      Also it's a bit sad that you got distracted by the Backblaze BS. All your SMART values are fine (0), only attributes 5, 187, 197 and 198 would be a real indication of problems that already happened or are just happening (but NO WAY to predict a drive failure from any of those numbers, your drive can fail at any time without any SMART value having increased -- it's pure BS to trust in 'statistics done wrong'. If you use disk storage you need BACKUP)

      And you look at the wrong SMART attribute. You should better check Power-Off_Retract_Count. If you installed the script the value should not increase ....
      ahhh, okay.
      My bad, i'd missinterpret the columns "Value", "Worst" and "Threshold".
      I've found a good declaration and now i know.

      Power-Off_Retract_Count stand by 8 and will not increase anymore.
      Thanks for your time.

      Regards,
      Minfred
    • tkaiser wrote:

      flmaxey wrote:

      (The only really interesting points in the Google paper's conclusions was the lack of evidence that high temperature and/or high drive utilization shortens life.)
      You should really read and try to understand section '3.5.6 Predictive Power of SMART Parameters'.

      Relying on SMART to predict drive failures is BS. If you have only a single disk statistics are irrelevant anyway (your disk doesn't know about statistics, only if you use tens of thousands of disks you can make use of numbers others generated with same amounts of disks). Even if you would have tens of thousands of disks still SMART is of no help to predict drive failures.

      Things to keep in mind: The Google paper is pretty old (dealing with disks from 15 years ago), those Google disks got 'scrubbed' regularly (which is something the average user never does) and the effect of vibrations isn't considered (since there are no SMART attributes taking care of this -- there exists also a 'nice' disk failure study from some Microsoft folks who happily ignore this and do some impressive statistics failures).


      At work we regularly pull disks out of the ZFS JBODs, put them into a test system where they are benchmarked (sequential and especially random IO, tested over the whole disk surface and another time over 10 partitions. Results are stored in a database together with the disk's WWN/WWID and if random IO performance especially over the whole disk drops, these disks will never be reattached to hot storage but are used in archive arrays from then on).

      This is just an assumption (if positioning and head movements take more time then there's something wrong) but no evidence since we would need to watch a few thousand disks die until we get any useful numbers.

      Again, as previously noted, nothing can be predicted with anything approaching 100% accuracy. (A predictive algorithm that is better than 50%, a simple flip of the coin, would be simply outstanding. A working and consistent example could make one a millionaire - ask any Stock or Futures Trader.) So it stands to reason that predictive algorithms of any kind, using any method, could be slaughtered in a white paper.

      But there is a difference between "possibility" and "probability". Possibilities are infinite. Statistics is an approach toward reducing an infinite number of possibilities down to something that's usable, "probability". Is it possible for a drive to fail with no signs or warnings at all (SMART or other)? Absolutely, the answer is yes. Is it possible for a drive with 20 reallocated sectors and other drive errors to last through 20 years or more of continuous use? Yep. But either of those two possibilities are not highly probable.
      _________________________________________________________________

      Thanks for noting the age of the Google paper - that was something I missed. :) One would "hope" that SMART technology has improved in 15 years, but it's not a topic I've followed over time.
      However, your main point regarding backup is taken. Would I worry about a reallocated sector or two? No. I have multiple backups. And with drive scrubbing technology, I don't rely on SMART at all. In my limited experience with it, I've found that tech that "scrubs" with checksums (BTRFS, ZFS, SNAPRAID, etc.) is much more sensitive to drive issues and low level corruption than SMART will ever be.
      _________________________________________________________________

      The stuff in blue, above, is fascinating. I've never considered factors that would progressively slow a drive down. Vibration is an obvious factor. I believe passive (liquid?) balancing is being used on platters, lately. And there's mechanical wear and tear which would slowly increase friction. These are factors most couldn't even consider, without a sufficient sample size.

      Video Guides :!: New User Guide :!: Docker Guides :!: Pi-hole in Docker
      Good backup takes the "drama" out of computing.
      ____________________________________
      Primary: OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      OMV 4.1.13, Intel Server SC5650HCBRP, 32GB ECC, 16GB USB boot, UnionFS+SNAPRAID
      Backup: OMV 4.1.9, Acer RC-111, 4GB, 32GB USB boot, 3TB+3TB zmirror, 4TB Rsync'ed disk