smartd detected "1 Currently unreadable (pending) sectors" while SMART is OK

  • Hi,


    One of my RAID 1 SSD (which contains a VM image) triggered an alert from smartd daemon tonight at 1:27AM:


    Quote from smartd daemon

    This message was generated by the smartd daemon running on: host name: nas DNS domain: hidden.tldThe following warning/error was logged by the smartd daemon:Device: /dev/disk/by-id/ata-CT500MX500SSD1_1911E1F0E132 [SAT], 1 Currently unreadable (pending) sectorsDevice info:CT500MX500SSD1, S/N:1911E1F0E132, WWN:5-00a075-1e1f0e132, FW:M3CR023, 500 GBFor details see host's SYSLOG.You can also use the smartctl utility for further investigation.Another message will be sent in 24 hours if the problem persists.


    Syslog shows:

    Code
    root@nas:~# cat /var/log/syslog | grep smart
    Jun 6 01:27:31 nas smartd[810]: Device: /dev/disk/by-id/ata-CT500MX500SSD1_1911E1F0E132 [SAT], 1 Currently unreadable (pending) sectors
    Jun 6 01:27:31 nas smartd[810]: Sending warning via /usr/share/smartmontools/smartd-runner to admin@hidden.tld ...
    Jun 6 01:27:31 nas smartd[810]: Warning via /usr/share/smartmontools/smartd-runner to admin@terageek.org: successful
    Jun 6 01:57:31 nas smartd[810]: Device: /dev/disk/by-id/ata-CT500MX500SSD1_1911E1F0E132 [SAT], No more Currently unreadable (pending) sectors, warning condition reset after 1 em


    It shows to be solved at 1:57AM. (I assume there is a cron every 30 minutes).


    smartctl output shows:



    I'm quite used about reading SMART and I can't see any error here... Which is comforting. But in the same time, there was still an error, and I don't like that.


    So my questions are:
    Do you have any idea what would this error indicate on a SATA SSD?
    Is there anything to worry about?
    How can there be such an error but nothing in SMART


    Any enlightment appreciated.
    Best regards

  • Well, it didn't occur to me that it could possibly a specific MX500 thing, that's very enlightening, thank you!
    We can see some folks having the same issues on multiple systems, so at least we know it's not a debian based or OMV specific issue.


    I'll contact Crucial about that and report back if I get any useful information.

  • I've had a pretty great tech support on chat at Crucial with some interesting conclusions that I'll share.


    Since I was running Linux and no Crucial tool existed for the OS, I was informed that Micron is owning Crucial, so most of the things are valid for both Micron and Crucial.
    So they asked me to install a Micron GUI diagnose tool which was a bit complicated since I'm only using CLI on my OMV. Then they pointed out there was also a CLI version.
    That's the tool: https://www.micron.com/product…torage-executive-software
    But then I pointed out "I'm not alone having this, checking my specific SMART is not relevant", then they more or less accepted my smartctl output and logs were relevant enough to answer my worries.


    The tech's conclusion is the following:


    It is perfectly fine, normal and expected to have pending sectors sometimes on an SSD, due to the nature of NAND memory.



    Therefore, your SMART value Current_Pending_Sector might get to non 0 values from time to times. I think what would be worrying is if it doesn't go back to 0 afterwards. Then it would mean that there is no room for moving to available blocks.


    What you should check is the value: Unused_Reserve_NAND_Blk
    You don't want this close to 0.


    For reference, I have a value of 45 on one drive and 46 on the other, for drives that are I think a bit less than 1 month old; upon checking, I have other emails for this warning on both drives: 2 on one drive, and 3 on the other, so the default value for my 500GB MX500 is Unused_Reserve_NAND_Blk: 48. As that speed (hopefully it slows down), the drives might be dead in no time... And I'm not even writing a lot onto it (just a VM with two game servers on it).


    They also provided me a doc with an explanation of how SMART attributes are calculated. The tech was unsure if this was shareable or not, but I found it publicly on Micron's website so... Here's the link. https://www.micron.com/-/media…_ssd_smart_attributes.pdf


    In conclusion: Nothing to worry about at least until the values grow up. And Crucial support rocks :)

  • I have a value of 45 on one drive and 46 on the other ... I have other emails for this warning on both drives: 2 on one drive, and 3 on the other, so the default value for my 500GB MX500 is Unused_Reserve_NAND_Blk: 48

    I really hope (for you) that this is just a coincidence and not correct math :) Please keep us informed whether new occurrences of pending sectors correlate with a decrease of the 180 attribute.

  • Thank you :)


    Yes, it is likely a coincidence (confirmation bias spotted). I've checked again, In fact there are only 4 mails total. (The 5th was another mail containing the "Pending" word, my bad.)
    That said, it is not impossible that an alert was sent while I didn't have emails setup on the NAS yet. (syslog are not kept before June 2nd so we can't know).


    I will try to report back once I have more data which will tell us more about the subject than my ravings.


    The advantage is we have my previous values for the record.


    Attached: Screenshot of errors detection times and dates.
    nas ssd errors.PNG
    Order of disks for errors is: A B A B
    I see no obvious time pattern for now, but from the few data available, errors seem to be more and more rare which would be great if it could go on like that :D

  • So, I've got many pending sectors emails today, which means more data!


    6 new for the disk CT500MX500SSD1_1911E1F0E132 which makes a total of 9
    And 3 new for the disk CT500MX500SSD1_1911E1F0F25B which makes a total of 5.


    First one has 45 Unused_Reserve_NAND_Blk
    Second one has 46 Unused_Reserve_NAND_Blk


    So the values are unchanged and doesn't seem to depend on current pending sectors activity, or at least it's not correlated with a 1:1 ratio which is good.


    That said, these emails are still freaking me out... I never like seing this kind of errors.
    Last time it happened, I had a 4TB out of warranty drive failing and it took me 1 week to download back my data from Hubic... Not that my connection was slow (I've got gigabit at home) but their servers suck (and they stopped Hubic since...). A solution might be to apply a filter in Thunderbird to put them as "read". Anyhow, I'll backup data on it more frequently.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!