How often and which self test should I schedule?

    • OMV 4.x
    • How often and which self test should I schedule?

      Hi everyone,
      I have 4 HDDs in my NAS.
      Everyone is already monitored, but I saw that there is also a schedule tab with 4 different type of self test.
      I wanted to ask a few things:
      1) What's the difference between normal monitoring and self test?
      2) Wich one should I do? Short, long, conveyance or offline?
      3) How often should I do that? does it shorten the HDD life span?

      Thanks in advance for the help!
      Intel G4400 - Asrock H170M Pro4S - 8GB ram - Be Quiet Pure Power 11 400 CM - Nanoxia Deep Silence 4 - 2X6TB Seagate Ironwolf - 2x4TB WD RED
      OMV 4.1.22 - Kernel 4.19 - omvextrasorg 4.1.2
    • I was thinking about once a week a short test, e once a month for the long one.
      But I'm not sure if it can hurt the disk doing it so often. I never found a complete answer for that, and I think that a "suggest schedule" should be in the wiki since it's a rather complex argument.
      Intel G4400 - Asrock H170M Pro4S - 8GB ram - Be Quiet Pure Power 11 400 CM - Nanoxia Deep Silence 4 - 2X6TB Seagate Ironwolf - 2x4TB WD RED
      OMV 4.1.22 - Kernel 4.19 - omvextrasorg 4.1.2
    • This is more of a matter of preference.

      I do a short test on all drives once a week, afterhours. I believe that's enough to increment the counts on the "maybe it's time to worry" SMART attributes:

      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.

      I don't do long tests as a matter of routine - only after something odd appears. (Follow up.)
    • New

      neurotone wrote:

      I think if we keep looking the web will find it :)
      I don't even know what software OMV is using to check the SMART data :(



      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.
      thanks!
      Intel G4400 - Asrock H170M Pro4S - 8GB ram - Be Quiet Pure Power 11 400 CM - Nanoxia Deep Silence 4 - 2X6TB Seagate Ironwolf - 2x4TB WD RED
      OMV 4.1.22 - Kernel 4.19 - omvextrasorg 4.1.2
    • New

      crashtest wrote:

      Blabla wrote:

      I don't even know what software OMV is using to check the SMART data
      The command line program is smartctl .A sample command is: smartctl -x /dev/sda


      like so right hint the green dot buy monitor? and I get a email update.





      My homebrew Nas
      OMV 4
      ASRock760GM-HDV
      AMD Phenom II X6 1065T
      Viper 3 16GB DDR3 1600MHz
      PNY 120GB SSD
      Two: Hitachi 4TB
      In a HP Pavilion case :D
    • New

      @neurotone
      I wouldn't worry just yet. While 1 reallocated sector is not good, it's not really bad either. (A few extra sectors are designed into a drive for this purpose - media errors.)
      I would keep an eye on it. If reallocated sectors start to increment upward, it's just a matter of time and it might go fast.

      Of note is, if the drive is between 4 to 7 years old (or the equivalent in hours) that's the rough expected life. They don't last forever.

      The post was edited 1 time, last by crashtest ().

    • New

      I only run smart tests manually, when I intend to actually look at them. I never run them automatically. It might be once per month or so. Or even less often.

      The smart tests are good for early warning before the drive fails totally. I used to run tests daily but stopped. I don't really care about early warning. If the drive fails it fails. I run OMV on several single drive SBC NAS. I have good backups and redundant servers.

      It might be a different story if you run a big raid array with many drives with no redundant server. Then it might be critical to get early warning and swap drives before they fail for real.
      OMV 4, 7 x ODROID HC2, 1 x ODROID HC1, 5 x 12TB, 1 x 8TB, 1 x 2TB SSHD, 1 x 500GB SSD, GbE, WiFi mesh
    • New

      crashtest wrote:

      I do a short test on all drives once a week, afterhours. I believe that's enough to increment the counts on the "maybe it's time to worry" SMART attributes
      Wrong. If you want to increase «the counts on the "maybe it's time to worry" SMART attributes» you need to run a long test.

      The difference is
      • short test: tests electronics (vendor specific), servos/motors (vendor specific), does some limited surface tests, finishes within 2 minutes
      • long test: the 2 min limitation is gone and whole surface will be tested, as such if there's a problem with a sector anywhere only the long test will show it


      A short test might result in the drive later switching the SMART overall-health self-assessment test from PASSED to FAIL (since the drive controller realizes position time of the heads increasing and such a servo about to die soon) but this will not increase values of those attributes you seem to be interested in.

      We do monthly long tests to get drives reporting uncorrectable errors and potentially mapping away bad sectors followed by ZFS resilvers the other day (scheduled on weekends since those are all business installations). If there's data corruption at the drive level it will be compensated by ZFS at the filesystem level directly afterwards.
      No more contributions to this project until 'alternative facts' (AKA ignorance/stupidity) are gone