How to Recover Degraded RAID5 after physical disk failure?

    • OMV 3.x
    • Resolved
    • How to Recover Degraded RAID5 after physical disk failure?

      Hello everyone,

      I am running OMV 3.0.99 on my NAS. I have set up a RAID5 (with mdadm) containing of three 5TB WD RED drives. I have set up the RAID using the OMV GUI back in the day (2016).

      Recently one of the disks has failed. When I got home I ve noticed that it makes weird sounds when starting and that it was not recognized anymore in LINUX. The device seems to be physically broken and I have set up a RMA since I still have warranty on the drives. Now I am waiting for the replacement drive and asking myself what I should do when the new drive is here to recover my RAID.

      Some notes on my drives:
      1. sda is my system drive
      2. the RAID contained of disks sdb, sdc and sdd. The drive sdd failed and has been physically removed from the NAS case.
      3. Now sdd is my backup disk (used to be sde before the RAID disk failed)



      Here are some important outputs of my system:


      "uname -a" output

      Source Code

      1. Linux homenas 4.9.0-0.bpo.6-amd64 #1 SMP Debian 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux
      cat /proc/mdstat

      Source Code

      1. root@homenas:~# cat /proc/mdstat
      2. Personalities : [raid6] [raid5] [raid4]
      3. md127 : active raid5 sdb[0] sdc[1]
      4. 9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      5. unused devices: <none>


      blkid

      Source Code

      1. root@homenas:~# blkid
      2. /dev/sda1: UUID="911053a9-f06c-4479-becb-cb8faa2a5783" TYPE="ext4" PARTUUID="2c92f843-01"
      3. /dev/sda5: UUID="28ae7474-1d14-48a6-9e8e-2ed31e060803" TYPE="swap" PARTUUID="2c92f843-05"
      4. /dev/sdb: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="e52bb12c-23e1-7c8f-a7f7-d52d4b2b46a9" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
      5. /dev/sdc: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="d9eac207-7167-d19e-c1de-8c7525b77d48" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
      6. /dev/sdd1: UUID="523cffe7-115d-49b4-95e0-7549aecdf064" TYPE="ext4" PARTUUID="fba4a7ee-026a-497f-9b3d-bbdec92cb0d6"
      7. /dev/md127: UUID="bd5ef96f-5587-4211-95c0-10219985ff6d" TYPE="ext4"


      fdisk -l | grep "Disk "

      Source Code

      1. root@homenas:~# fdisk -l | grep "Disk "
      2. Disk /dev/sda: 29,8 GiB, 32017047552 bytes, 62533296 sectors
      3. Disk identifier: 0x2c92f843
      4. Disk /dev/sdb: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
      5. Disk /dev/sdc: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
      6. Disk /dev/sdd: 1,8 TiB, 2000394706432 bytes, 3907020911 sectors
      7. Disk identifier: C0401C51-A74A-4675-935E-AF9BF6706166
      8. Disk /dev/md127: 9,1 TiB, 10001693278208 bytes, 19534557184 sectors

      cat /etc/mdadm/mdadm.conf

      Source Code

      1. root@homenas:~# cat /etc/mdadm/mdadm.conf
      2. # mdadm.conf
      3. #
      4. # Please refer to mdadm.conf(5) for information about this file.
      5. #
      6. # by default, scan all partitions (/proc/partitions) for MD superblocks.
      7. # alternatively, specify devices to scan, using wildcards if desired.
      8. # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      9. # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      10. # used if no RAID devices are configured.
      11. DEVICE partitions
      12. # auto-create devices with Debian standard permissions
      13. CREATE owner=root group=disk mode=0660 auto=yes
      14. # automatically tag new arrays as belonging to the local system
      15. HOMEHOST <system>
      16. # definitions of existing MD arrays
      17. ARRAY /dev/md/NAS metadata=1.2 name=HomeNAS:NAS UUID=bb8b3798:d16071b4:cc60bc8f:dc8e0761
      18. # instruct the monitoring daemon where to send mail alerts
      19. MAILADDR <<<<REMOVED FOR PRIVACY RESONS>>>>
      Display All

      mdadm --detail --scan --verbose

      Source Code

      1. root@homenas:~# mdadm --detail --scan --verbose
      2. ARRAY /dev/md127 level=raid5 num-devices=3 metadata=1.2 name=HomeNAS:NAS UUID=bb8b3798:d16071b4:cc60bc8f:dc8e0761
      3. devices=/dev/sdb,/dev/sdc

      mdadm --detail /dev/md127

      Source Code

      1. root@homenas:~# mdadm --detail /dev/md127
      2. /dev/md127:
      3. Version : 1.2
      4. Creation Time : Sat Mar 12 17:22:49 2016
      5. Raid Level : raid5
      6. Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
      7. Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      8. Raid Devices : 3
      9. Total Devices : 2
      10. Persistence : Superblock is persistent
      11. Update Time : Sun Jan 27 13:11:42 2019
      12. State : clean, degraded
      13. Active Devices : 2
      14. Working Devices : 2
      15. Failed Devices : 0
      16. Spare Devices : 0
      17. Layout : left-symmetric
      18. Chunk Size : 512K
      19. Name : HomeNAS:NAS
      20. UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
      21. Events : 305
      22. Number Major Minor RaidDevice State
      23. 0 8 16 0 active sync /dev/sdb
      24. 1 8 32 1 active sync /dev/sdc
      25. 4 0 0 4 removed
      Display All

      I have searched the internet and found different steps, but I dont know which are necessary in my situation:
      1. mark disk as failed
      2. remove disk from array
      3. copy partition table of one remaining disk of the array to the new replacement drive
      4. re-add drive to the array (--> rebuild will be automatically be initiated)
      Since the disk failed completly and was not present in Linux anymore I could not mark it as failed and remove it from the array. I have found the following command to remove a disk from the array which is not present anymore:
      mdadm /dev/md127 -r detached
      Is it recommended to use this command before I install the new drive? Or is it not necessary to remove the drive from the array in my case?

      I would really appreciate your guidance!
      Thanks in advance

      The post was edited 3 times, last by bash0r ().

    • Before writing my post I have intensively searched the internet for guides regarding the rebuild of a degraded RAID due to a failed disk. Since my harddisk has died completely I cannot use mdadm --fail and mdadm --remove. Therefore I assume these steps are necessary in my situation:
      1. Remove the Backup disk temporarly from the NAS to free the name /dev/sdd
      2. Install the new (replacement) disk
      3. Copy the partition table to the new disk with sfdisk -d /dev/sdb | sfdisk /dev/sd<NEWDISK> or sfdisk -d /dev/sdc | sfdisk /dev/sd<NEWDISK>
      4. Reboot
      5. Add the new disk to the raid mdadm /dev/md127 --manage --add /dev/sd<NEWDISK>
      6. Let it rebuild and watch with watch cat /proc/mdstat


      So gar so good. But what I am really struggeling with is whether or not I have to remove the no longer existing disk from the array with mdadm /dev/md127 -r detached prior to performing the steps mentioned above.

      Please advise and correct my proposed steps before I ruin the whole RAID :)

      Thanks!

      The post was edited 1 time, last by bash0r ().

    • bash0r wrote:

      Do I have to do it? Or will the as „removed“ marked disk disappear as soon as I add the new replacement disk?
      It's not going to disappear when you add another disk, because as far as mdadm is can still 'see it' just not as a physical disk. As this is Raid 5 what you should be able to do is to add disk which will become /dev/sde within your system then add is to array allow the array to rebuild then 'detach' the 'removed' one. But running detached first then adding the disk should make no difference as your array is currently 'clean degraded' and it will remain in that state provided you don't specify /dev/sd[b or c] in any of your mdadm commands to remove the failed drive.
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      bash0r wrote:

      Do I have to do it? Or will the as „removed“ marked disk disappear as soon as I add the new replacement disk?
      It's not going to disappear when you add another disk, because as far as mdadm is can still 'see it' just not as a physical disk. As this is Raid 5 what you should be able to do is to add disk which will become /dev/sde within your system then add is to array allow the array to rebuild then 'detach' the 'removed' one. But running detached first then adding the disk should make no difference as your array is currently 'clean degraded' and it will remain in that state provided you don't specify /dev/sd[b or c] in any of your mdadm commands to remove the failed drive.
      Thanks for your help. Unfortunately the command doesnt seem to have an effect :(
      After executing the command there's no output.

      Source Code

      1. root@homenas:~# mdadm /dev/md127 -r detached
      2. root@homenas:~# mdadm --detail /dev/md127
      3. /dev/md127:
      4. Version : 1.2
      5. Creation Time : Sat Mar 12 17:22:49 2016
      6. Raid Level : raid5
      7. Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
      8. Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      9. Raid Devices : 3
      10. Total Devices : 2
      11. Persistence : Superblock is persistent
      12. Update Time : Thu Jan 31 18:44:44 2019
      13. State : clean, degraded
      14. Active Devices : 2
      15. Working Devices : 2
      16. Failed Devices : 0
      17. Spare Devices : 0
      18. Layout : left-symmetric
      19. Chunk Size : 512K
      20. Name : HomeNAS:NAS
      21. UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
      22. Events : 313
      23. Number Major Minor RaidDevice State
      24. 0 8 16 0 active sync /dev/sdb
      25. 1 8 32 1 active sync /dev/sdc
      26. 4 0 0 4 removed
      Display All
    • geaves wrote:

      bash0r wrote:

      Thanks for your help. Unfortunately the command doesnt seem to have an effect
      Two things, are doing this via ssh or directly on the machine (this should not make any difference really), did wait before doing mdadm --detail, another option might be to try --remove instead of -r.
      I am running the commands via SSH.
      Yes, I have waited before running mdadm --detail.
      I have tried:

      Source Code

      1. mdadm /dev/md127 --fail detached
      2. mdadm /dev/md127 -r failed
      3. mdadm /dev/md127 --remove failed
      4. mdadm /dev/md127 -r detached
      5. mdadm /dev/md127 --remove detached
      Should I just try to add the replacement disk?

      The post was edited 1 time, last by bash0r ().

    • Just a quick note about the device names so you're not confused:
      • I have installed the new disk and it received the name /dev/sdb
      • It seems like the disk that actually failed was /dev/sdb. So the array contained of sdb, sdc, sdd
      • After the disk sdb failed and removed it the other disk went form sdc, sdd to sdb and sdc.
      • That's why I've added /dev/sdb to the array because that actually is the new (empty) disk.


      Source Code

      1. root@homenas:~# mdadm --manage /dev/md127 --add /dev/sdb
      2. mdadm: added /dev/sdb


      Source Code

      1. root@homenas:~# mdadm --detail /dev/md127
      2. /dev/md127:
      3. Version : 1.2
      4. Creation Time : Sat Mar 12 17:22:49 2016
      5. Raid Level : raid5
      6. Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
      7. Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      8. Raid Devices : 3
      9. Total Devices : 3
      10. Persistence : Superblock is persistent
      11. Update Time : Thu Jan 31 20:10:27 2019
      12. State : clean, degraded, recovering
      13. Active Devices : 2
      14. Working Devices : 3
      15. Failed Devices : 0
      16. Spare Devices : 1
      17. Layout : left-symmetric
      18. Chunk Size : 512K
      19. Rebuild Status : 0% complete
      20. Name : HomeNAS:NAS
      21. UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
      22. Events : 339
      23. Number Major Minor RaidDevice State
      24. 0 8 32 0 active sync /dev/sdc
      25. 1 8 48 1 active sync /dev/sdd
      26. 3 8 16 2 spare rebuilding /dev/sdb
      Display All


      Brainfuck Source Code

      1. root@homenas:~# cat /proc/mdstat
      2. Personalities : [raid6] [raid5] [raid4]
      3. md127 : active raid5 sdb[3] sdc[0] sdd[1]
      4. 9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      5. [>....................] recovery = 0.3% (16989028/4883639296) finish=456.8min speed=177525K/sec
      6. unused devices: <none>



      Fortunately it is rebuilding after I've added the disk.
      Additionally the device that has been shown as "removed" is gone :)


      Is it correct that it is showing "spare rebuilding"?
    • bash0r wrote:

      Just a quick note about the device names so you're not confused:


      I have installed the new disk and it received the name /dev/sdb

      It seems like the disk that actually failed was /dev/sdb. So the array contained of sdb, sdc, sdd

      After the disk sdb failed and removed it the other disk went form sdc, sdd to sdb and sdc.

      That's why I've added /dev/sdb to the array because that actually is the new (empty) disk.
      Ok now I've just had a brain fart...confused...I was lost at line two.

      For whatever reason it's seen it as a spare and by using --add it's rebuilding, according to your first post the two drives in the active array were b and c. Well at least it worked :thumbup:
      Raid is not a backup! Would you go skydiving without a parachute?
    • Sorry :D Maybe I have not explained it well. The first post ist obsolete because the drive names changed after I installed the new disk to the system.

      After installing the new drive to my NAS but before I added the new drive to the RAID, this was the output:

      Source Code

      1. root@homenas:~# mdadm --detail /dev/md127
      2. /dev/md127:
      3. Version : 1.2
      4. Creation Time : Sat Mar 12 17:22:49 2016
      5. Raid Level : raid5
      6. Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
      7. Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      8. Raid Devices : 3
      9. Total Devices : 2
      10. Persistence : Superblock is persistent
      11. Update Time : Thu Jan 31 20:06:09 2019
      12. State : clean, degraded
      13. Active Devices : 2
      14. Working Devices : 2
      15. Failed Devices : 0
      16. Spare Devices : 0
      17. Layout : left-symmetric
      18. Chunk Size : 512K
      19. Name : HomeNAS:NAS
      20. UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
      21. Events : 337
      22. Number Major Minor RaidDevice State
      23. 0 8 32 0 active sync /dev/sdc
      24. 1 8 48 1 active sync /dev/sdd
      25. 4 0 0 4 removed
      Display All


      Then I added the new and empty disk /dev/sdb to the RAID and the output changed to:


      Source Code

      1. root@homenas:~# mdadm --detail /dev/md127
      2. /dev/md127:
      3. Version : 1.2
      4. Creation Time : Sat Mar 12 17:22:49 2016
      5. Raid Level : raid5
      6. Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
      7. Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      8. Raid Devices : 3
      9. Total Devices : 3
      10. Persistence : Superblock is persistent
      11. Update Time : Thu Jan 31 20:50:30 2019
      12. State : clean, degraded, recovering
      13. Active Devices : 2
      14. Working Devices : 3
      15. Failed Devices : 0
      16. Spare Devices : 1
      17. Layout : left-symmetric
      18. Chunk Size : 512K
      19. Rebuild Status : 9% complete
      20. Name : HomeNAS:NAS
      21. UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
      22. Events : 347
      23. Number Major Minor RaidDevice State
      24. 0 8 32 0 active sync /dev/sdc
      25. 1 8 48 1 active sync /dev/sdd
      26. 3 8 16 2 spare rebuilding /dev/sdb
      Display All

      I assume the "spare rebuilding" will change to "active sync" after the rebuilding is done?
    • geaves wrote:

      bash0r wrote:

      I assume the "spare rebuilding" will change to "active sync" after the rebuilding is done?
      Yes it should....if it doesn't don't call me, I'll call you :D

      Fortunately it went all well :) :thumbsup:

      Source Code

      1. root@homenas:~# cat /proc/mdstat
      2. Personalities : [raid6] [raid5] [raid4]
      3. md127 : active raid5 sdb[3] sdc[0] sdd[1]
      4. 9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      5. unused devices: <none>



      Source Code

      1. root@homenas:~# mdadm --detail /dev/md127
      2. /dev/md127:
      3. Version : 1.2
      4. Creation Time : Sat Mar 12 17:22:49 2016
      5. Raid Level : raid5
      6. Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
      7. Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      8. Raid Devices : 3
      9. Total Devices : 3
      10. Persistence : Superblock is persistent
      11. Update Time : Fri Feb 1 06:40:52 2019
      12. State : clean
      13. Active Devices : 3
      14. Working Devices : 3
      15. Failed Devices : 0
      16. Spare Devices : 0
      17. Layout : left-symmetric
      18. Chunk Size : 512K
      19. Name : HomeNAS:NAS
      20. UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
      21. Events : 457
      22. Number Major Minor RaidDevice State
      23. 0 8 32 0 active sync /dev/sdc
      24. 1 8 48 1 active sync /dev/sdd
      25. 3 8 16 2 active sync /dev/sdb
      Display All


      Once again: thank you very much for your help especially for answering that fast!