Need to swap a dying disk in a RAID5

  • I'm pretty overwhelmed here.... It was extremely easy to set up my RAID but it turns out one of the 4 drives is bad. Seems the GUI doesn't do much more than build RAIDs. I've been messing with various commands to remove the defective disk and replace it with a new one. I am totally at a loss on what to do.


    I need to remove /dev/sde and replace it with /dev/sdb after it reboots. When I fail sde and remove it, the array says it isn't mounted. I'm not particularly handy with linux but I can google stuff. My googling has turned up a hundred different weird scenarios but not a hand-holding guide.


    At the very least I managed to undo whatever dumb stuff I did by adding the damaged drive back to the array. It still works but it's getting SMART errors so I need to replace it.

  • The first thing I'd try to do, if you don't have it, is backup.
    ______________________________________________________________


    What version of OMV are you running? (3 or 4?)


    Do you have space for a spare drive, or do you need to remove the dying disk first? (And)
    Can you SSH into the command line?

  • It still works but it's getting SMART errors so I need to replace it.

    First you need to post the output from each of the following found here paste each into the code or spoiler on the menu bar makes it easier to read than a full list.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • Code
    root@homeserver:~# cat /proc/mdstat                                                                                                                                     
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]                                                                                   
    md0 : active raid5 sde[3] sdd[2] sdb[0] sdc[1]                                                                                                                          
          2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]                                                                                         
          bitmap: 0/8 pages [0KB], 65536KB chunk                                                                                                                            
    unused devices: <none>                                                                                                                                                  
    root@homeserver:~#
    Code
    root@homeserver:~# blkid                                                                                                                                                
    /dev/sde: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="9fd1202b-0aee-1c95-9dbb-9cedd2070e91" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"       
    /dev/sdc: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="59677b6b-833a-a19b-bf83-ec9790f93459" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"       
    /dev/sdb: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="0c6dba8b-2076-26a7-54ae-2da1236ab07d" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"       
    /dev/sdd: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="24808ad9-ea26-be8b-9b00-0363c36e1a6c" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"       
    /dev/sda1: UUID="fa288192-4443-45d4-814d-e983d45f8bad" TYPE="ext4" PARTUUID="36a51092-01"                                                                               
    /dev/sda5: UUID="ee949a26-9cb5-4898-9e5c-0b99c95c2281" TYPE="swap" PARTUUID="36a51092-05"                                                                               
    /dev/md0: LABEL="RaidyMcraiderson" UUID="2db70634-0479-480c-85f7-885635ba5f4f" TYPE="ext4"                                                                              
    root@homeserver:~#
    Code
    root@homeserver:~# fdisk -l | grep "Disk "                                                                                                                              
    Disk /dev/sde: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors                                                                                                       
    Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors                                                                                                       
    Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors                                                                                                       
    Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors                                                                                                       
    Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors                                                                                                         
    Disk identifier: 0x36a51092                                                                                                                                             
    Disk /dev/md0: 2.7 TiB, 3000211341312 bytes, 5859787776 sectors                                                                                                         
    root@homeserver:~#
    Code
    root@homeserver:~# mdadm --detail --scan --verbose
    ARRAY /dev/md0 level=raid5 num-devices=4 metadata=1.2 name=openmediavault:StorageRAID UUID=23cc2cfb:b7983a53:710b36db:89d1cc37
       devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde
    root@homeserver:~#


    Ok so I have 4x1 gig SATA drives and 1x500 gig boot drive on an old SATA PCI card as the boot volume. I have no more sockets to add a 6th drive so I would need to remove the bad drive and then swap it with the new one. I'm using the latest OMV v4.


    Thanks for the help! I think I need to remove the bad drive and replace it with the new one, then mount the RAID volume and add the disk to the array. Only thing is that I'm not sure how to mount the array with 3 drives to allow mdadm to add the drive.

  • Thanks for the help! I think I need to remove the bad drive and replace it with the new one, then mount the RAID volume and add the disk to the array. Only thing is that I'm not sure how to mount the array with 3 drives to allow mdadm to add the drive.

    Ok which drive do want to replace, because the Raid is clean and active, if a drive had failed it would come up clean/degraded.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • Ok which drive do want to replace, because the Raid is clean and active


    The drive in question is /dev/sde . As previously mentioned, it's throwing SMART errors . I was working on a process to replace the drive, with screen shots.
    Do you have this one?
    _________________________________________________


    @thecheat001
    What are the smart errors for /dev/sde?


    The following are the attributes where, if they increment, future failure is more likely:
    SMART 5 – Reallocated_Sector_Count.
    SMART 187 – Reported_Uncorrectable_Errors.
    SMART 188 – Command_Timeout.
    SMART 197 – Current_Pending_Sector_Count.
    SMART 198 – Offline_Uncorrectable.

  • The drive in question is /dev/sde . As previously mentioned, it's throwing SMART errors . I was working on a process to replace the drive, with screen shots.
    Do you have this one?

    I don't mind, but the OP says he wants to replace /dev/sde with /dev/sdb after a reboot, but /dev/sdb is already part of the array, hence my question which drive!


    You do pictures :P more than I do :)

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • I don't mind, but the OP says he wants to replace /dev/sde with /dev/sdb after a reboot, but /dev/sdb is already part of the array, hence my question which drive!

    (I don't mind either - I have electrical work to do today.)
    I picked up on the drive/device thing but, we'll have to see what @thecheat001 's SMART stat's are, and for which drive.

  • Ok, sorry, I was out mowing my lawn... summer is already here.


    I've attached a couple pictures, one of the disks themselves and the specific stats of the failing drive.


    When I remove the failing disk, /dev/sde, and insert the new drive, the new drive mounts at /dev/sdb and everything seems to move down a letter. Or maybe it doesn't, I was in total damage control mode and was freaking out that I made a giant mistake.


    If I had a free SATA socket or could hot swap the disk I think I would be good, the guide I was following seemed to be fine up till that point. As it is, when I boot back up with the original three drives and the new one, the array doesn't mount and I'm sort of lost at this point.

  • Ok, sorry, I was out mowing my lawn... summer is already here.

    You can come and do mine :D


    If I had a free SATA socket or could hot swap the disk I think I would be good,

    Mdadm is not hot swappable unfortunately :) but we'll let @flmaxey decide about your drive, but that looks similar to one of mine. However, for mdadm because the raid is clean and working you have to fail the drive then remove it then add the new one all from the cli.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • You can come and do mine :D

    Mdadm is not hot swappable unfortunately :) but we'll let @flmaxey decide about your drive, but that looks similar to one of mine. However, for mdadm because the raid is clean and working you have to fail the drive then remove it then add the new one all from the cli.


    We're trying to escape Florida and move up north, I'm so sick of the heat. 43 years of "year round" summer and I'm done.


    I was following a guide from the shell and it pretty much had me doing that - fail the drive, shut down, remove it and replace with the new drive, reboot, add new drive to array, rebuild. I got stuck on the adding the new drive part because /dev/md0 wasn't mounted.


    If I were to wait until the raid says the drive is bad, would it be a different procedure? More assistance from the GUI? I think I'd like to remove it sooner rather than later though.

  • What you do is fail the drive using mdadm, then remove the drive using mdadm, shutdown, install the new drive, reboot the raid should appear as clean/degraded, then add the new drive using mdadm.
    At least the above is the procedure anyway.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • What you do is fail the drive using mdadm, then remove the drive using mdadm, shutdown, install the new drive, reboot the raid should appear as clean/degraded, then add the new drive using mdadm.
    At least the above is the procedure anyway.


    That's where the issue is, the RAID doesn't mount with 3 drives so I can't add to the dismounted array. Is it possible that the mount points are getting screwed up because the new drive is at /dev/sdb?

  • /dev/sde is dying. There's little doubt about that - it's just a matter of time.


    Since you're on OMV4, swapping the drive out should be possible in the GUI
    ________________________________________________________


    Go into Storage, Disks ,and check/verify /dev/sde against the model, SN#, Vendor and the other details of /dev/sde so you'll know which drive to physically pull.



    Then:



    The result should be: State clean,degraded


    - Shutdown and, using the info gathered on the drive, find and remove /dev/sde and set it aside. (To be safe, to be able to back out if needed, leave it as it is.)
    - Install the new drive (the same size or larger) in the now empty slot.
    - Boot up
    _____________________________________________________________


    OMV should come up with /dev/md0 in the same state (clean, degraded) as it was when shut down. If the array is in the GUI, skip down to the next screen shot.


    If not try the following on the CLI:


    mdadm --assemble --force /dev/md0


    Check the GUI again to see if the array is up in the clean, degraded state. (If you see the array in the GUI, skip to the next screen shot.)
    If it's not there, you could specify the disks to are currently active in the array (3 of them at this point) but first, you'll need to verify the device name for the newly added drive, in Storage, Disks. Don't be surprised if the device name for the new drive is the same as the one you removed.


    You don't want to include the device name for the newly installed drive in the following command line. (I'm going to assume that the new drive has assumed the device name of the old drive, /dev/sde.)

    mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdc /dev/sdd



    If this doesn't work and the array won't assemble, you're at a decision point. The following is what I'd do:
    I would shut down, reinstall the old disk, boot up and see if I could get the array to assemble again. If successful, I would shutdown, go out and buy an external drive (at least 4TB) and an UPS's, if I didn't have one.
    I'd boot up on the UPS, with the new external drive connected, and remove the bad drive from the array again. (Why? The array can run one drive down and all that the dying drive can do is corrupt data as it dies.)
    Mount the external drive and copy the entire array, complete, to the external disk using Rsync. I wouldn't shutdown until this is complete.


    _____________________________________________________________


    If the array is in the GUI with a state of clean, degraded, got back to Storage, Disks, and "wipe" the new drive and note it's device name.
    Then, in the GUI, do the following: (Where the device selected in the dialog is the device name of the new wiped drive)



    You should see, in "State", that the array is recovering. This is going to take awhile and will go faster if you don't access network shares.

  • Is it possible that the mount points are getting screwed up because the new drive is at /dev/sdb?

    It shouldn't no, at present don't do this!!


    But what you would do in this order;


    mdadm --manage /dev/md0 --fail /dev/sde if that errors you would do mdadm --stop /dev/md0 then run the command again.


    mdadm --manage /dev/md0 --remove /dev/sde


    That would fail and remove the drive /dev/sde from the array to check that cat /proc/mdstat should return the array as clean/degraded.


    Shutdown, remove the drive you have failed, insert the new drive and reboot.


    From the GUI make a note of the drive letter associated to the drive /dev/sd? Prepare the drive by doing a quick wipe.


    Check the Raid Management the Raid should display clean/degraded.


    From the cli mdadm --manage /dev/md0/ -- /dev/sd? (the ? being the drive letter) if that errors stop the array and run again, then do cat /proc/mdstat and mdadm will add it and start to resync the array.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • @geaves is correct here. If /dev/sdb is currently operating (and mobo/drive sata connections are not moved around), /dev/sdb shouldn't give up it's working device name. As noted, I wouldn't be surprised if the new drive comes up under the old drives device name (/dev/sde) or another device name that is not in use.


    (I'm not saying it would be impossible but that would be a weird BIOS interaction with the OS.)

  • Oh thank you very much!!!! I ended up getting the old drive out, installing the new drive, then rebooting. The array was still not displaying in the RAID Management tab. I went to the shell and tried to "mdadm --assemble --force /dev/md0 /dev/sdc /dev/sdd /dev/sde" but it said they were busy. Then I stopped /dev/md0 and tried to add it again, which didn't work. So I rebooted and tried to add it again and it mounted the array with three drives and it showed up in the GUI. Success!


    Now I've added the new disk in the GUI and it's rebuilding! Thanks again!

  • It took two boots before it the array came up clean, degraded? (One can never know for sure - there's some many variables.)


    I think the command should have been:
    mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdc /dev/sdd because the drive removed was /dev/sde (But we won't quibble. :) )


    Just curious, what was the device name for the new drive? What did new drive come up as - was it /dev/sde ?

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!