4 disc Raid5: S.M.A.R.T attributes show errors - S.M.A.R.T. Test PASSED!

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • 4 disc Raid5: S.M.A.R.T attributes show errors - S.M.A.R.T. Test PASSED!

      Hi,
      i'm running a 4 disc mdadm RAID5

      sda (3TB)
      sdb (3TB)
      sdc (3TB)
      sdd (3TB)

      One drive (sdb) shows a red dot in OMV SMART settings (Popup message: Drive has few bad blocks).
      Here a log of Smart-Attributes:

      Source Code

      1. root@Server:~# sudo smartctl -A /dev/sdb
      2. === START OF READ SMART DATA SECTION ===
      3. SMART Attributes Data Structure revision number: 16
      4. Vendor Specific SMART Attributes with Thresholds:
      5. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      6. 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 8791
      7. 3 Spin_Up_Time 0x0027 175 174 021 Pre-fail Always - 6208
      8. 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3283
      9. 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 3
      10. 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
      11. 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 7197
      12. 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
      13. 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
      14. 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1263
      15. 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 66
      16. 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3216
      17. 194 Temperature_Celsius 0x0022 117 105 000 Old_age Always - 33
      18. 196 Reallocated_Event_Count 0x0032 197 197 000 Old_age Always - 3
      19. 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 142
      20. 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
      21. 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
      22. 200 Multi_Zone_Error_Rate 0x0008 200 187 000 Old_age Offline - 0
      Display All
      Not so good: :/
      Raw_Read_Error_Rate: 8791 (other discs show 0)
      Reallocated_Sector_Ct: 3 (other discs show 0)
      Current_Pending_Sector: 142 (other discs show 0)

      Should i be concerned? The Raid-Status is clean, all is working fine, except the strange Smart-Status log.

      sdb has few bad sectors which are pending. I tried to get the bad block numbers to reallocate these, but Smart health-test PASSED! ?(

      Source Code

      1. root@Server:~# smartctl -H /dev/sdb
      2. === START OF READ SMART DATA SECTION ===
      3. SMART overall-health self-assessment test result: PASSED
      Did selftests and they completed without errors and do not show bad blocks in LBA_of_first_error! Why? There are pending sectors!

      Source Code

      1. root@Server:~# smartctl -l selftest /dev/sdb
      2. SMART Self-test log structure revision number 1
      3. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
      4. # 1 Short offline Completed without error 00% 7195 -
      5. # 2 Extended offline Completed without error 00% 7180 -
      6. # 3 Short offline Completed without error 00% 7172 -
      tune2fs, fdisk need the exact block numbers to reallocate/overwrite pending blocks, but there are no block faulty blocks under LBA_of_first_error
      How can i fix the pending sectors without loosing raid5 or data-loss?

      Please help
      pappl

      The post was edited 5 times, last by PAPPL ().

    • I have very little experience with RAID and SMART errors.

      But I would do this:

      1. Order a new replacement HDD at once!
      2. Check that your backups are safe.
      3. Consider not using the NAS until you have replaced the HDD, creating extra backups may push the disk over the edge, perhaps along with more disks, resulting in total loss of all data. But you have backups, don't you? So that is not a big problem.
      OMV 4, 7 x ODROID HC2, 1 x ODROID HC1, 5 x 12TB, 1 x 8TB, 1 x 2TB SSHD, 1 x 500GB SSD, GbE, WiFi mesh
    • 5 Reallocated_Sector_Ct must be 0, you have still 140 sector to avoid data loss, so you have time to order a new disk or return in guaranty your defect disk
      OMV 4.1.11 x64 on a HP T510, 16GB CF as Boot Disk & 32GB SSD 2,5" disk for Data, 4 GB RAM, CPU VIA EDEN X2 U4200 is x64 at 1GHz

      Post: HPT510 SlimNAS ; HOWTO Install Pi-Hole ; HOWTO install MLDonkey ; HOHTO Install ZFS-Plugin ; OMV_OldGUI ; ShellinaBOX ; ctop
      Dockers: MLDonkey ; PiHole ; weTTY
      Videos: @TechnoDadLife
    • Thanks for your help!

      raulfg3 wrote:

      5 Reallocated_Sector_Ct must be 0, you have still 140 sector to avoid data loss, so you have time to order a new disk or return in guaranty your defect disk
      Good news!

      tkaiser wrote:

      PAPPL wrote:

      Raw_Read_Error_Rate: 8791
      Which drive vendor? In case it's Seagate you might want to educate yourself about 'numbers without meaning' (at least when interpreted as decimal numbers which is wrong): users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html
      WD Red 3TB, thanks for the information.

      Adoby wrote:

      I have very little experience with RAID and SMART errors.

      But I would do this:

      1. Order a new replacement HDD at once!
      2. Check that your backups are safe.
      3. Consider not using the NAS until you have replaced the HDD, creating extra backups may push the disk over the edge, perhaps along with more disks, resulting in total loss of all data. But you have backups, don't you? So that is not a big problem.
      Backups are made for all shares i don't want to lose.

      How do i replace one drive of the 4 disc raid5 system?

      - Buy a new 3TB drive
      - Power off the server ([b]OMV 4.1.17)[/b]
      - Replace the faulty hdd
      - Power on and log into OMV web administration
      - Raid5 will resync automaticly for hours? Or do i need to keep additional settings in mind?


      pappl
    • PAPPL wrote:

      Power off the server (OMV 4.1.17)
      - Replace the faulty hdd
      - Power on and log into OMV web administration
      - Raid5 will resync automaticly for hours? Or do i need to keep additional settings in mind?
      No, all this will need to be completed from the cli.

      If you search for replacing a hard drive in an mdadm raid 5 you'll get some answers. This is just one the only thing I'm not sure about is if the array has to be stopped first.
      Raid is not a backup! Would you go skydiving without a parachute?
    • After reading about Raid5 array data loss, i don't want to wait until the suspicious drive gets more errors.
      If a second drive fails all data is lost, i should have done a more secure Raid6 array. I have a backup of all important data, but not for all data due to high costs.

      I'm a little afraid of swapping the drive and rebuild process, because some users get problems after swapping a still functioning drive in a clean Raid5 array. (Rebuild not possible, array afterwards not visible, array mount not possible, data lost,...) =O
      So the drive which has to be replaced must be removed from the array via cli first.
      Every tutorial seems to have another command options (--force,...).
      Interesting there is no noob-proof sticky tutorial thread about swapping a still working drive/faulty drive and rebuilding Raid5 after all these years.



      pappl
    • PAPPL wrote:

      Interesting there is no noob-proof sticky tutorial thread about swapping a still working drive/faulty drive and rebuilding Raid5 after all these years.
      Nothing is noob proof :) here is a thread you may find interesting, for the average home user the use of a raid set up makes no sense, @Adoby has an excellent thread on here describing his own set up, that is a much better option than using raid. Raid is easy to set up can be a PITA recovering.
      Raid is not a backup! Would you go skydiving without a parachute?
    • PAPPL wrote:

      If a second drive fails all data is lost

      It's worse than that with most RAID-5 implementations. A single URE (unrecoverable read error) on one of the remaining disks occurring when rebuilding a RAID-5 can stop the whole rebuild and your whole array is lost.

      Traditional RAID is not about data safety but only about data availability and nobody at home needs this. Unfortunately almost everyone at home playing RAID forgets about backup.
      No more contributions to this project until 'alternative facts' (AKA ignorance/stupidity) are gone