How to check if the drive is really dead?

  • Hello all,


    i am pretty sure, my issues are not related to OMV, but I think chances are high to find some around here, who is familiar with a case like this.


    First some system information: My server contains a ASRock J3710 board, a cheap SSD as system drive and a WD Red.


    A few weeks ago, I upgrade from OMV 5 to 6, clean install from the official image, and added a new Crucial MX500 to the system.

    Two days later, we had a a thunderstorm with at least one (short) blackout (power grid is normally pretty stable here in Germany).

    After this is when I got the first system monitor mail, “Status failed (1) /srv/dik… is not a mountpoint” for both the MX500 and the WD Red.

    I repaired the superblocks for both devices, and they were back online.

    When I tried to reboot the server the next time, the cheap SSD threw a bunch of ATA errors and booting failed.

    I bought a new cheap SSD and reinstalled OMV.

    Since then, the WD Red lost its mountpoint every few days, sometimes being rediscovered and remounted automatically the next day. Replacement for it is already here.

    Today, same for the MX500, but I was not able to get it back online


    Sometimes the devices are visible with fdisk -l, sometimes not. If they are visible, repairing the suberblock brings them back online for a few hours or day. When I checked the cheap SSD on my laptop, it also threw superblock errors.


    So long story short:

    Is it possible for lightning to destroy three drives in one strike?

    What would be the best way, to check if the drives are really dead and not just some file system corruption? Especially for the MX500, as I would like to send it in to check for warranty. I don’t care for the data (got backups) but I really do not want it to become suddenly mountable again, like the WD Red as there is some sensible stuff on it.


    Best regards

    Alex

  • either the drives, or the controller on the mother board are bad or intermittently defect. Suggested is to:


    1. check the SMART logs of the suspect bad drives for any errors

    2. swap the 3 suspect drivers into a known good motherboard and see if the issue follows any of the 3 drives

    3. swap a known good drive into the suspect bad motherboard and see if the issue follows the mother board

  • The corrupted superblock issue was reproduceable, when i tested the cheap SSD and WD with my Notebook. So I think (hope) it's not an issue with the board, althoug I still have to test the new HDD.

    • Offizieller Beitrag

    So long story short:

    Is it possible for lightning to destroy three drives in one strike?

    It certainly is possible. Anything above 70V out of the PS, and I'm also talking about really short duration voltage spikes (in milliseconds), and there's a real risk of damage to CMOS. (CMOS is what PC's are using these days.)

    The problem with CMOS is, when it's damaged there's 3 potential states it might be left in:
    1. Dead (This is easy.)

    2. Intermittent issues. (This can be maddening.)
    3. It works fine, but intermittent issues may crop up days to even years down the road. Sometimes triggers can be things like changes in temperature and humidity. (This is relatively rare, but within the realm of possibility.

    What would be the best way, to check if the drives are really dead and not just some file system corruption?

    The best way is to use the manufacture's test software. WD has utilities. The cheap SSD? That depends. You might not find anything from the OEM but something -> here might be useful.


    Sometimes the devices are visible with fdisk -l, sometimes not. If they are visible, repairing the suberblock brings them back online for a few hours or day. When I checked the cheap SSD on my laptop, it also threw superblock errors.

    SSD's are as susceptible to voltage transitions and spikes as the MOBO.

    You could run some long tests on the MOBO, without the hard drives attached, using -> memtest86. This test wont' completely clear the MOBO for all possible fault issues, but it is a good indicator.
    __________________________________________________________________________________

    Two days later, we had a a thunderstorm with at least one (short) blackout (power grid is normally pretty stable here in Germany).

    I've heard this before from German users. It doesn't matter how "stable" your power is, if there are few to no outages, etc. Most short duration power spikes are not noticed by home owners, but they can seriously damage electronics. These events can happen from weather (lightening strikes), accidents (a truck hits a power pole or a tree downs a line) or from something power company related like an overloaded transformer blowing up.

    At an absolute minimum, you should have a good quality surge suppressing power strip. While I have whole house surge suppression on my power panel, in addition, I believe in using an UPS.

  • The servers power outlet is surge protected, but obviously in case of a lightning strike i would not bet on it any more..


    Checked the MX500 today with my laptop., where it was/is perfectly readable.

    I am no expert, but SMART values seems not that bad. I'll take a closer look at the mainboard...

    Still not sure, how trustworty the MX500 is, though.



Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!