Hi all,
So I created a raid5 with openmediavault. A day later, I see a faulty drive. The funny thing is, smart doesn't show anything wrong. Any ideas?
openmediavault:~# mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Fri Nov 23 18:39:47 2012
Raid Level : raid5
Array Size : 5860538880 (5589.05 GiB 6001.19 GB)
Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Thu Nov 29 23:02:31 2012
State : active, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 0 0 1 removed
2 8 48 2 active sync /dev/sdd
3 8 64 3 active sync /dev/sde
1 8 32 - faulty spare /dev/sdc
openmediavault:~# smartctl -a /dev/sdc
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format) family
Device Model: WDC WD20EARS-00MVWB0
Firmware Version: 50.0AB50
User Capacity: 2,000,398,934,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Nov 29 23:08:22 2012 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 169 162 021 Pre-fail Always - 6541
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2837
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10766
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 161
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 132
193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 31498
194 Temperature_Celsius 0x0022 117 103 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Also I'm seeing high load for no apparent reason
top - 23:40:37 up 1 day, 21:36, 2 users, load average: 7.03, 7.20, 6.42
Tasks: 137 total, 1 running, 135 sleeping, 1 stopped, 0 zombie
Cpu(s): 4.9%us, 11.9%sy, 0.0%ni, 65.0%id, 17.6%wa, 0.0%hi, 0.6%si, 0.0%st
Mem: 8181524k total, 8129960k used, 51564k free, 1427220k buffers
Swap: 15977464k total, 0k used, 15977464k free, 5756416k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
995 root 20 0 0 0 0 D 2 0.0 40:01.36 jbd2/md127-8
30580 tvhost 20 0 334m 68m 3780 S 2 0.9 4:36.68 python
32401 root 20 0 203m 1800 1096 S 2 0.0 0:00.57 collectd