Clicking on "SMART > Devices > Information" increases ATA error count!

  • Hello together,


    I've spent hours to investigate but now I need help.
    I bought 4 new 4TB HDDs (WD Red WD40EFRX) for my server. They seem to be fine but every time I click on "SMART > Devices > Information" the ATA Error count for the selected drive is increased by 1.

    The important lines are

    "ABRT" is an ATA error. And it occurs every time OMV gathers informations by clicking the "Information" button in SMART section. System log is clean when this error occurs but from time to time I get messages like this one:


    Aug 27 03:24:40 server smartd[40602]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], ATA error count increased from 9 to 11


    The thing is: Executing "smartctl -x /dev/sdd" in terminal does NOT increase ATA error count.


    I searched the web and found a lot of other people (here and in other boards) having this exact problem - often with omv and WD HDDs. But they all did not see the connection between this OMV Button and the increasing counts.


    Something seems to be wrong with the implementation of the SMART Information window but I don't know what. (probably an invalid command or an invalid argument) The corresponding lines of code must be somewhere here https://github.com/openmediava…mediavault/system/storage , but I'm not a programmer and I'm tired. So I am a bit lost here.


    Please tell me what commands (and arguments) are executed when I click on "Information"!


    Thanks for your help!

  • Here some samples:

    edit: I did not notice any "silent" incrementation over time but I can not rule out this. The ATA error count definitely increases when I click on "Information".

  • @ tkaiser: Thanks for those documentation links. 414 sites ... 8| Probably they are helpful some time.


    @ votdev: As I wrote. I double and triple checked that smartctl -x /dev/sdd without -ddoes NOT increase the counter however # smartctl -x [-d N] does.


    Where does OMV get the device type N get from and where can I see the command that is actually executed on my system? There must be something wrong with the type. (or at least WD drives don't like it) And why is -d needed here, when smartctl -x /dev/sdd also works?


    See smartctl manpage: https://www.smartmontools.org/…artmontools/smartctl.8.in


    edit: Before I realized the above I found so many people and long threads all over the internet describing this exact problem but without solutions. Some even swapped their HDDs for new ones, cables and so on because others said something like "increasing SMART counts are a severe problem ... blablabla" And all have absolutely no idea how this is triggered. So I think it would be a good thing to sort this "bug" out now.


    edit2: the oldest threads where from ~2013. So I also think that WD drives have to be handled in an other way (in general) than omv / smartctl does.

    • Offizieller Beitrag

    Thanks. https://www.smartmontools.org/browser/trunk/smarthan yourthan yours.s.tmontools/smartctl.8.in says that the default is "auto" and this seems to work for me. So I ask: Why is -d needed here?


    Thanks for your time!

    'auto' seems to work for YOU, but remember that there is other hardware out there that behaves different. The OMV code tries to guess the optimal setting for the detected device. For SATA devices (this should be the one for your WD device if connected via SATA) the implementation does not return a type except if it is connected via USB.

  • But how can I see what command is actually executed?


    smartctl -x /dev/sdd works and doesn't increment the ATA error count.


    smartctl -x -d sat /dev/sdd also works and doesn't increment the ATA error count.


    smartctl -x -d ata /dev/sdd also works and doesn't increment the ATA error count but gives this return at the end of the other Informations


    ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented
    Read SATA Phy Event Counters failed


    This link from your list
    https://github.com/openmediava…toragedevicecciss.inc#L45
    tells me that probably something like smartctl -x -d cciss ... ??? is executed but i don't have any idea what exactly.


    But as this is a HP Proliant Microserver Gen8 it has indeed a HP's Smart Array controller in it. But it is deactivated in bios. The HDDs run in AHCI mode not as Smart Array! Whats going wrong here?


    The HP Microservers, WD HDDs and OMV are a very popular combination and so I could easily link here dozens of threads where people are facing the exact same behavior and have no idea where the increasing errors come from.

  • I could easily link here dozens of threads where people are facing the exact same behavior


    Google search for "00 00 00 00 00 00 00 00 Error: ABRT" site:forum.openmediavault.org reveals that this occurs only with WD disks. I doubt it's directly related to a smartctl call, maybe something with stuff that happens directly before.


    I wish execsnoop would be available on Linux (DTrace is sooo convenient to debug stuff on Solaris, FreeBSD and macOS), last time I needed to examine an Ubuntu system's behaviour I followed https://github.com/iovisor/bcc/blob/master/INSTALL.md#source


    You might try this: https://bugs.launchpad.net/ubu…/+bug/1470014/comments/10 (checking contents of /dev/disk/by-id/wwn* and then adding appropriate sections to /etc/hdparm.conf. Then a reboot might be necessary.

  • Google search for "00 00 00 00 00 00 00 00 Error: ABRT" site:forum.openmediavault.org reveals that this occurs only with WD disks. I doubt it's directly related to a smartctl call, maybe something with stuff that happens directly before.

    Hmm... Maybe. But it would help to know the exact command that is executed by omv so that I could file a bug at smartmontools trac.


    I'm sure that it is smartctl -x -d cciss + some device stuff parameter but as I'm not a programmer I don't understand this code as much as necessary:
    github.com/openmediavault/open…toragedevicecciss.inc#L45


    I think the problem ist somewhere between smartctl and the firmware of WD drives, but I need the exact command, that is executed to confirm or falsify my theory.

  • I found some interesting log entries in Syslog:

    Code
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52, type changed from 'scsi' to 'sat'
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], opened
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], WDC WD40EFRX-68N32N0, S/N:WD-WCC7K4LA7Z52, WWN:5-0014ee-20fd3c0fb, FW:82.00A82, 4.00 TB
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], found in smartd database: Western Digital Red
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], enabled SMART Attribute Autosave.
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], enabled SMART Automatic Offline Testing.
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], is SMART capable. Adding to "monitor" list.
    Aug 26 15:17:11 server smartd[738]: Device: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4LA7Z52 [SAT], state read from /var/lib/smartmontools/smartd.WDC_WD40EFRX_68N32N0-WD_WCC7K4LA7Z52.ata.state


    Those entries do appear for all drives (also system SSD). ?!?

  • That's why I mentioned execsnoop. OMV relies on some stuff provided by systemd (or maybe udev) which does its own thing.

    I'm sorry but this is too high for me.


    But it would be very nice to have help to understand what this code does and how I can get the infos I need to reconstruct the parameters that are actually inserted behind cciss.


    github.com/openmediavault/open…toragedevicecciss.inc#L45


    edit: I'm sorry but I have to leave now for cinema. Thanks for your help. Will read your answers tomorrow. :)

  • Changed the environment variable to "1" but monit restart omv-engined throws an error:
    Cannot create socket to [localhost]:2812 -- Connection refused :S


    I'll do a reboot now.


    ---


    Nope. No messages in "Syslog", when I click on SMART Information. ;(


    But my server is sending me emails every time i do a reboot:

  • OMV_DEBUG_PHP="1"

    Shouldn't it read OMV_DEBUG_PHP="YES" now?

    /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2UEYHHR [SAT]

    The '[SAT]' doesn't look right to me since SAT should only occur with USB attached drives. Maybe this is the culprit (and adding the devices with '-d ata -a' to /etc/smartd.conf will give some insights)

  • You have to start omv-engined in foreground

    I followed this. In my case with a SATA attached Apple branded Hitachi HDD the following commands are executed:

    Code
    udevadm info --query=property --name='/dev/sda' 2>&1
    smartctl -x '/dev/sda' 2>&1

    No icrease of ATA errors of course (I don't use WD disks):

    Code
    root@espressobin:/home/tk# (udevadm info --query=property --name='/dev/sda' 2>&1 ; smartctl -x /dev/sda 2>&1) | curl -F 'f:1=<-' ix.io
    http://ix.io/1lpD

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!