How to Recover Degraded RAID5 after physical disk failure?

  • Hello everyone,


    I am running OMV 3.0.99 on my NAS. I have set up a RAID5 (with mdadm) containing of three 5TB WD RED drives. I have set up the RAID using the OMV GUI back in the day (2016).


    Recently one of the disks has failed. When I got home I ve noticed that it makes weird sounds when starting and that it was not recognized anymore in LINUX. The device seems to be physically broken and I have set up a RMA since I still have warranty on the drives. Now I am waiting for the replacement drive and asking myself what I should do when the new drive is here to recover my RAID.


    Some notes on my drives:

    • sda is my system drive
    • the RAID contained of disks sdb, sdc and sdd. The drive sdd failed and has been physically removed from the NAS case.
    • Now sdd is my backup disk (used to be sde before the RAID disk failed)



    Here are some important outputs of my system:



    "uname -a" output

    Code
    Linux homenas 4.9.0-0.bpo.6-amd64 #1 SMP Debian 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux

    cat /proc/mdstat

    Code
    root@homenas:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] 
    md127 : active raid5 sdb[0] sdc[1]
          9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
    unused devices: <none>


    blkid

    Code
    root@homenas:~# blkid
    /dev/sda1: UUID="911053a9-f06c-4479-becb-cb8faa2a5783" TYPE="ext4" PARTUUID="2c92f843-01"
    /dev/sda5: UUID="28ae7474-1d14-48a6-9e8e-2ed31e060803" TYPE="swap" PARTUUID="2c92f843-05"
    /dev/sdb: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="e52bb12c-23e1-7c8f-a7f7-d52d4b2b46a9" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
    /dev/sdc: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="d9eac207-7167-d19e-c1de-8c7525b77d48" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
    /dev/sdd1: UUID="523cffe7-115d-49b4-95e0-7549aecdf064" TYPE="ext4" PARTUUID="fba4a7ee-026a-497f-9b3d-bbdec92cb0d6"
    /dev/md127: UUID="bd5ef96f-5587-4211-95c0-10219985ff6d" TYPE="ext4"


    fdisk -l | grep "Disk "

    Code
    root@homenas:~# fdisk -l | grep "Disk "
    Disk /dev/sda: 29,8 GiB, 32017047552 bytes, 62533296 sectors
    Disk identifier: 0x2c92f843
    Disk /dev/sdb: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
    Disk /dev/sdc: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
    Disk /dev/sdd: 1,8 TiB, 2000394706432 bytes, 3907020911 sectors
    Disk identifier: C0401C51-A74A-4675-935E-AF9BF6706166
    Disk /dev/md127: 9,1 TiB, 10001693278208 bytes, 19534557184 sectors


    cat /etc/mdadm/mdadm.conf


    mdadm --detail --scan --verbose

    Code
    root@homenas:~# mdadm --detail --scan --verbose
    ARRAY /dev/md127 level=raid5 num-devices=3 metadata=1.2 name=HomeNAS:NAS UUID=bb8b3798:d16071b4:cc60bc8f:dc8e0761
       devices=/dev/sdb,/dev/sdc


    mdadm --detail /dev/md127


    I have searched the internet and found different steps, but I dont know which are necessary in my situation:

    • mark disk as failed
    • remove disk from array
    • copy partition table of one remaining disk of the array to the new replacement drive
    • re-add drive to the array (--> rebuild will be automatically be initiated)

    Since the disk failed completly and was not present in Linux anymore I could not mark it as failed and remove it from the array. I have found the following command to remove a disk from the array which is not present anymore:
    mdadm /dev/md127 -r detached
    Is it recommended to use this command before I install the new drive? Or is it not necessary to remove the drive from the array in my case?


    I would really appreciate your guidance!
    Thanks in advance

  • I'm having a similar issue here. Every tutorial seems to use different cli commands, someone even ran a temporary linux iso to do the steps. ?(

    ........
    I have searched the internet and found different steps, but I dont know which are necessary in my situation:
    ........
    I would really appreciate your guidance!
    Thanks in advance

  • Before writing my post I have intensively searched the internet for guides regarding the rebuild of a degraded RAID due to a failed disk. Since my harddisk has died completely I cannot use mdadm --fail and mdadm --remove. Therefore I assume these steps are necessary in my situation:

    • Remove the Backup disk temporarly from the NAS to free the name /dev/sdd
    • Install the new (replacement) disk
    • Copy the partition table to the new disk with sfdisk -d /dev/sdb | sfdisk /dev/sd<NEWDISK> or sfdisk -d /dev/sdc | sfdisk /dev/sd<NEWDISK>
    • Reboot
    • Add the new disk to the raid mdadm /dev/md127 --manage --add /dev/sd<NEWDISK>
    • Let it rebuild and watch with watch cat /proc/mdstat


    So gar so good. But what I am really struggeling with is whether or not I have to remove the no longer existing disk from the array with mdadm /dev/md127 -r detached prior to performing the steps mentioned above.


    Please advise and correct my proposed steps before I ruin the whole RAID :)


    Thanks!

    • Offizieller Beitrag

    Do I have to do it? Or will the as „removed“ marked disk disappear as soon as I add the new replacement disk?

    It's not going to disappear when you add another disk, because as far as mdadm is can still 'see it' just not as a physical disk. As this is Raid 5 what you should be able to do is to add disk which will become /dev/sde within your system then add is to array allow the array to rebuild then 'detach' the 'removed' one. But running detached first then adding the disk should make no difference as your array is currently 'clean degraded' and it will remain in that state provided you don't specify /dev/sd[b or c] in any of your mdadm commands to remove the failed drive.

  • It's not going to disappear when you add another disk, because as far as mdadm is can still 'see it' just not as a physical disk. As this is Raid 5 what you should be able to do is to add disk which will become /dev/sde within your system then add is to array allow the array to rebuild then 'detach' the 'removed' one. But running detached first then adding the disk should make no difference as your array is currently 'clean degraded' and it will remain in that state provided you don't specify /dev/sd[b or c] in any of your mdadm commands to remove the failed drive.

    Thanks for your help. Unfortunately the command doesnt seem to have an effect :(
    After executing the command there's no output.


    • Offizieller Beitrag

    Thanks for your help. Unfortunately the command doesnt seem to have an effect

    Two things, are doing this via ssh or directly on the machine (this should not make any difference really), did wait before doing mdadm --detail, another option might be to try --remove instead of -r.

  • Two things, are doing this via ssh or directly on the machine (this should not make any difference really), did wait before doing mdadm --detail, another option might be to try --remove instead of -r.

    I am running the commands via SSH.
    Yes, I have waited before running mdadm --detail.
    I have tried:

    Code
    mdadm /dev/md127 --fail detached
    
    
    mdadm /dev/md127 -r failed
    mdadm /dev/md127 --remove failed
    
    
    mdadm /dev/md127 -r detached
    mdadm /dev/md127 --remove detached

    Should I just try to add the replacement disk?

  • Just a quick note about the device names so you're not confused:

    • I have installed the new disk and it received the name /dev/sdb
    • It seems like the disk that actually failed was /dev/sdb. So the array contained of sdb, sdc, sdd
    • After the disk sdb failed and removed it the other disk went form sdc, sdd to sdb and sdc.
    • That's why I've added /dev/sdb to the array because that actually is the new (empty) disk.


    Code
    root@homenas:~# mdadm --manage /dev/md127 --add /dev/sdb
    mdadm: added /dev/sdb



    Code
    root@homenas:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md127 : active raid5 sdb[3] sdc[0] sdd[1]
          9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
          [>....................]  recovery =  0.3% (16989028/4883639296) finish=456.8min speed=177525K/sec
    
    
    unused devices: <none>



    Fortunately it is rebuilding after I've added the disk.
    Additionally the device that has been shown as "removed" is gone :)



    Is it correct that it is showing "spare rebuilding"?

    • Offizieller Beitrag

    Ok now I've just had a brain fart...confused...I was lost at line two.


    For whatever reason it's seen it as a spare and by using --add it's rebuilding, according to your first post the two drives in the active array were b and c. Well at least it worked :thumbup:

  • Sorry :D Maybe I have not explained it well. The first post ist obsolete because the drive names changed after I installed the new disk to the system.


    After installing the new drive to my NAS but before I added the new drive to the RAID, this was the output:



    Then I added the new and empty disk /dev/sdb to the RAID and the output changed to:




    I assume the "spare rebuilding" will change to "active sync" after the rebuilding is done?

  • Yes it should....if it doesn't don't call me, I'll call you :D


    Fortunately it went all well :):thumbup:


    Code
    root@homenas:~# cat /proc/mdstat 
    Personalities : [raid6] [raid5] [raid4] 
    md127 : active raid5 sdb[3] sdc[0] sdd[1]
          9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
    unused devices: <none>




    Once again: thank you very much for your help especially for answering that fast!

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!