[Solved] RAID5 - Replacing 2 of 3 disks (failing SMART checks)

  • My dying HDDs:


    RAID5-001.png


    The drives above normally operate around 40-42° (not ideal from what I've read), but I took the screenshot above with the case open.


    cat /proc/mdstat

    Code
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md127 : active raid5 sdd[1] sdc[2]      15627788288 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]      bitmap: 0/59 pages [0KB], 65536KB chunkunused devices: <none>


    blkid

    (I have no idea why the UUID_SUB for the last one is so long or why my setup is md127 instead md0 or something numerically lower...)


    fdisk -l | grep "Disk "


    cat /etc/mdadm/mdadm.conf


    mdadm --detail --scan --verbose

    Code
    ARRAY /dev/md/tomanas1raid5 level=raid5 num-devices=3 metadata=1.2 name=tomanas1raid5 UUID=f1d3e5d2:a10e7ea5:82ac9b10:c0fbebcf   devices=/dev/sdc,/dev/sdd
    Zitat

    Post type of drives and quantity being used as well.

    3x Toshiba MN06ACA800/JP 8TB NAS HDD (CMR)

    Zitat

    Post what happened for the array to stop working? Reboot? Power loss?

    Array is still working, but is approaching imminent failure (I assume). I received the following email alert last week:


    ==========

    This message was generated by the smartd daemon running on:


    host name: omv

    DNS domain: mylocal


    The following warning/error was logged by the smartd daemon:


    Device: /dev/disk/by-id/ata-TOSHIBA_MN06ACA800_11Q0A0CPF5CG [SAT], FAILED SMART self-check. BACK UP DATA NOW!


    Device info: TOSHIBA MN06ACA800, S/N:11Q0A0CPF5CG, WWN:5-000039-aa8d1b7f8, FW:0603, 8.00 TB

    ==========

    (This was for /dev/sdd. I received an identical email for the other dying drive, /dev/sdb at the same time)


    I've already have a regular backup of the data and the replacement hard drives arrived a little while ago. Since all three HDDs in this RAID5 were purchased new and installed last April, I'm also quite concerned as to why two of them would start failing this early and simultaneously. But first things first, I've been reading up on how to re-construct my setup with the replacement drives and came upon this thread/post:


    What you do is fail the drive using mdadm, then remove the drive using mdadm, shutdown, install the new drive, reboot the raid should appear as clean/degraded, then add the new drive using mdadm.

    At least the above is the procedure anyway.


    I just want to confirm I'm not misunderstaing what it means to "fail the drive using mdadm" before I do something stupid. Is the following what I should be entering via SSH/CLI?

    Code
    mdadm /dev/md127 -f /dev/sdb


    Also, with 2 of the 3 drives nearing failure, I'm wondering if I should just rebuild the RAID5 from zero and copy the data over from the backup, instead of trying to replace the drives one by one?


    Thank you!


    (edit: fixed code line breaks)

  • q2m2v

    Hat den Titel des Themas von „RAID5 (3x 8TB): Need to replace two disks (failing SMART checks)“ zu „RAID5 - Replacing 2 of 3 disks (failing SMART checks)“ geändert.
  • KM0201

    Hat das Thema freigeschaltet.
  • KM0201

    Hat das Thema freigeschaltet.
    • Offizieller Beitrag

    Is the following what I should be entering via SSH/CLI?

    You can do this from the cli, you would then have to remove the drive from the array, but as the array is still working you can do it from the gui.

    I'm wondering if I should just rebuild the RAID5 from zero and copy the data over from the backup, instead of trying to replace the drives one by one

    This would be the most sensible option due to the size of the drives, you would also have to remove smb shares then shared folders that are linked to the array.

  • This would be the most sensible option due to the size of the drives, you would also have to remove smb shares then shared folders that are linked to the array.

    Thank you very much for the advice.


    Since I've never replaced a failed drive in a RAID array before, I decided to go ahead and rebuild the array by replacing the faulty drives one at a time, just to get a feel for it in case whenever I need to do this again in the future.


    Also, just before I was going to write this reply, one of the drives (sdb) went and failed on its own. I removed it and the array is now rebuilding with one good drive (sdc) and the other about to die (sdd). Assuming sdd survives long enough to get through the rebuild, I would then need to "fail the drive". As you mentioned, this can be done through the GUI. I wasn't able to find that option in the "RAID Management" menu though. Or is it "Remove" that does it?

    • Offizieller Beitrag

    Assuming sdd survives long enough to get through the rebuild, I would then need to "fail the drive". As you mentioned, this can be done through the GUI. I wasn't able to find that option in the "RAID Management" menu though. Or is it "Remove" that does it?

    =O if it rebuilds, then from the GUI select remove in Raid Management, select the drive from the dialog, click OK and the drive will be removed from the array.

  • =O if it rebuilds, then from the GUI select remove in Raid Management, select the drive from the dialog, click OK and the drive will be removed from the array.

    Luckily, the dying drive managed to survive through the rebuild last week and the reconstructed array completed a long SMART test without issue. OMV has been running fine since.


    Had an odd little bug pop up where I was getting "SparesMissing" notifications even though I've never tried to install spares. Perhaps the rebuilding routine added spares=1 to my mdadm.conf? Anyway, I changed it to =0 according to this post: Remove the "SparesMissing event"


    Thank you again for your help.

  • q2m2v

    Hat den Titel des Themas von „RAID5 - Replacing 2 of 3 disks (failing SMART checks)“ zu „[Solved] RAID5 - Replacing 2 of 3 disks (failing SMART checks)“ geändert.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!