Hello to all,
today I noticed that my RAID5 (6x4TB Western Digital Red) had gone into status "clean, degraded".
I then checked in mdadm and saw that one of the drives had been removed from the RAID by mdadm.
I proceeded to run some short selftests using smartctl -t short /dev/sd<x> on all drives and they came back as
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.9.0-0.bpo.4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6770 -
on all the drives except for the one that has been removed from the RAID5:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 70% 5158 9
# 2 Short offline Completed: read failure 70% 5158 9
I think the drive might be dead already or is about to fail, so it does not make sense to try to "fix" the RAID using mdadm right now ?
The drive only has 5158hrs (214days) on it so I think its a hardware failure of the drive ?
My plan of action would be:
- get a replacement for the drive (I still got warranty on it)
- shutdown the NAS and replace the faulty drive
- recover the RAID5
Could you guys please give me your thoughts on the drive / plan of action above ?
Thank you!