Hi
I am having a serious headache with multiple SMART erorrs.
So here is the deal:
Asrock E3C226D2I
Core i3
16 GB ECC RAM
System disk Samsung EVO 840
Storage 5x WD RED 5TB
I have tried STD Wheezy Kernel 3.2 and backports Kernel 3.16 - no difference.
I do get tons of the following errors an all of my ATA ports EXCEPT the one with the Samsung SSD:
Dec 17 07:46:57 StoreME kernel: [129393.578460] ata6.00: exception Emask 0x50 SAct 0x400000 SErr 0x48c0800 action 0xe frozen
Dec 17 07:46:57 StoreME kernel: [129393.580786] ata6.00: irq_stat 0x04000040, connection status changed
Dec 17 07:46:57 StoreME kernel: [129393.583295] ata6: SError: { HostInt CommWake 10B8B LinkSeq DevExch }
Dec 17 07:46:58 StoreME kernel: [129393.585494] ata6.00: failed command: READ FPDMA QUEUED
Dec 17 07:46:58 StoreME kernel: [129393.587529] ata6.00: cmd 60/58:b0:b0:cf:4a/00:00:3d:01:00/40 tag 22 ncq 45056 in
Dec 17 07:46:58 StoreME kernel: [129393.587529] res 40/00:a8:a8:cf:4a/00:00:3d:01:00/40 Emask 0x50 (ATA bus error)
Dec 17 07:46:58 StoreME kernel: [129393.591696] ata6.00: status: { DRDY }
Dec 17 07:46:58 StoreME kernel: [129393.593769] ata6: hard resetting link
Dec 17 07:46:58 StoreME kernel: [129394.313871] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 17 07:46:58 StoreME kernel: [129394.315194] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 17 07:46:58 StoreME kernel: [129394.315204] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 17 07:46:58 StoreME kernel: [129394.315210] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 17 07:46:58 StoreME kernel: [129394.317175] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 17 07:46:58 StoreME kernel: [129394.317184] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 17 07:46:58 StoreME kernel: [129394.317190] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 17 07:46:58 StoreME kernel: [129394.317769] ata6.00: configured for UDMA/133
Dec 17 07:46:58 StoreME kernel: [129394.317945] ata6: EH complete
Alles anzeigen
Sometimes the error is WRITE FPDMA instead of READ - but only sometimes.
It seems that the drives boots up with 6.0Gbs links and then gets degraded to 1.5Gbs when the errors appear.
My actions so far:
replaced cables
replaced ATC PSU
removed backplanes to directly attach the drive to SATA controller
replaced 2 out of 5 drives due to a different and real drive FAULT - the new drives showed the same message directly after plugging them in
That DID not solve the issues!
When I disable the SATA PM (which I enabled in rc.local for each port) the errors go away !? So it seems to be a power / wakeup related issue.
However I encountered 3 different errors like this one (only 3 in 12h):
StoreME kernel: [17199.585854] ata3.00: exception Emask 0x0 SAct 0x6000000 SErr 0x0 action 0x0
Dec 18 22:54:41 StoreME kernel: [17199.585925] ata3.00: irq_stat 0x40000008
Dec 18 22:54:41 StoreME kernel: [17199.585966] ata3.00: failed command: READ FPDMA QUEUED
Dec 18 22:54:41 StoreME kernel: [17199.586020] ata3.00: cmd 60/08:d0:e0:28:05/00:00:a8:00:00/40 tag 26 ncq 4096 in
Dec 18 22:54:41 StoreME kernel: [17199.586020] res 41/40:00:e0:28:05/00:00:a8:00:00/40 Emask 0x409 (media error) <F>
Dec 18 22:54:41 StoreME kernel: [17199.586159] ata3.00: status: { DRDY ERR }
Dec 18 22:54:41 StoreME kernel: [17199.586197] ata3.00: error: { UNC }
Dec 18 22:54:41 StoreME kernel: [17199.598261] ata3.00: configured for UDMA/133
Dec 18 22:54:41 StoreME kernel: [17199.598279] sd 2:0:0:0: [sdc] Unhandled sense code
Dec 18 22:54:41 StoreME kernel: [17199.598281] sd 2:0:0:0: [sdc]
Dec 18 22:54:41 StoreME kernel: [17199.598283] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 18 22:54:41 StoreME kernel: [17199.598285] sd 2:0:0:0: [sdc]
Dec 18 22:54:41 StoreME kernel: [17199.598287] Sense Key : Medium Error [current] [descriptor]
Dec 18 22:54:41 StoreME kernel: [17199.598290] Descriptor sense data with sense descriptors (in hex):
Dec 18 22:54:41 StoreME kernel: [17199.598291] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 18 22:54:41 StoreME kernel: [17199.598299] a8 05 28 e0
Dec 18 22:54:41 StoreME kernel: [17199.598303] sd 2:0:0:0: [sdc]
Dec 18 22:54:41 StoreME kernel: [17199.598306] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 18 22:54:41 StoreME kernel: [17199.598308] sd 2:0:0:0: [sdc] CDB:
Dec 18 22:54:41 StoreME kernel: [17199.598309] Read(16): 88 00 00 00 00 00 a8 05 28 e0 00 00 00 08 00 00
Dec 18 22:54:41 StoreME kernel: [17199.598319] end_request: I/O error, dev sdc, sector 2818910432
Dec 18 22:54:41 StoreME kernel: [17199.598359] ata3: EH complete
Alles anzeigen
The last errors are different because thy are UNC read errors, suggestion a failed read from the drive itsself and not a "communication" problem like all the other errors.
But in regard to the UNC errors the drive does not show reallocated or failed sectors - SMART info for each drive is in the attached TXT document.
Is the "resetting link" a known problem linux or WD REDs problem when PM for SATA is enabled?
Asrock suugested replacing the MB, but that was before I tried disabling the PM - so I am waiting for a new answer.
I am also waiting for WD support to answer me.
So here is my Question for the Pro's:
Is this:
A) A real hardware bug and I should replace either the driver OR the MB !?
B) A Linux Kernel bug related to SATA PM?
C) A drive issue/incompatibility and I should replace the drives with other brand?
I think I can ignore the UNC rad erros as long as I don't have any bad sectors - right?
thx !!!!