Need to swap a dying disk in a RAID5

    • OMV 4.x
    • Resolved

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Need to swap a dying disk in a RAID5

      I'm pretty overwhelmed here.... It was extremely easy to set up my RAID but it turns out one of the 4 drives is bad. Seems the GUI doesn't do much more than build RAIDs. I've been messing with various commands to remove the defective disk and replace it with a new one. I am totally at a loss on what to do.

      I need to remove /dev/sde and replace it with /dev/sdb after it reboots. When I fail sde and remove it, the array says it isn't mounted. I'm not particularly handy with linux but I can google stuff. My googling has turned up a hundred different weird scenarios but not a hand-holding guide.

      At the very least I managed to undo whatever dumb stuff I did by adding the damaged drive back to the array. It still works but it's getting SMART errors so I need to replace it.
    • Source Code

      1. root@homeserver:~# cat /proc/mdstat
      2. Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
      3. md0 : active raid5 sde[3] sdd[2] sdb[0] sdc[1]
      4. 2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      5. bitmap: 0/8 pages [0KB], 65536KB chunk
      6. unused devices: <none>
      7. root@homeserver:~#

      Source Code

      1. root@homeserver:~# blkid
      2. /dev/sde: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="9fd1202b-0aee-1c95-9dbb-9cedd2070e91" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"
      3. /dev/sdc: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="59677b6b-833a-a19b-bf83-ec9790f93459" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"
      4. /dev/sdb: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="0c6dba8b-2076-26a7-54ae-2da1236ab07d" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"
      5. /dev/sdd: UUID="23cc2cfb-b798-3a53-710b-36db89d1cc37" UUID_SUB="24808ad9-ea26-be8b-9b00-0363c36e1a6c" LABEL="openmediavault:StorageRAID" TYPE="linux_raid_member"
      6. /dev/sda1: UUID="fa288192-4443-45d4-814d-e983d45f8bad" TYPE="ext4" PARTUUID="36a51092-01"
      7. /dev/sda5: UUID="ee949a26-9cb5-4898-9e5c-0b99c95c2281" TYPE="swap" PARTUUID="36a51092-05"
      8. /dev/md0: LABEL="RaidyMcraiderson" UUID="2db70634-0479-480c-85f7-885635ba5f4f" TYPE="ext4"
      9. root@homeserver:~#

      Source Code

      1. root@homeserver:~# fdisk -l | grep "Disk "
      2. Disk /dev/sde: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
      3. Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
      4. Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
      5. Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
      6. Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors
      7. Disk identifier: 0x36a51092
      8. Disk /dev/md0: 2.7 TiB, 3000211341312 bytes, 5859787776 sectors
      9. root@homeserver:~#

      Source Code

      1. root@homeserver:~# cat /etc/mdadm/mdadm.conf
      2. # mdadm.conf
      3. #
      4. # Please refer to mdadm.conf(5) for information about this file.
      5. #
      6. # by default, scan all partitions (/proc/partitions) for MD superblocks.
      7. # alternatively, specify devices to scan, using wildcards if desired.
      8. # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      9. # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      10. # used if no RAID devices are configured.
      11. DEVICE partitions
      12. # auto-create devices with Debian standard permissions
      13. CREATE owner=root group=disk mode=0660 auto=yes
      14. # automatically tag new arrays as belonging to the local system
      15. HOMEHOST <system>
      16. # definitions of existing MD arrays
      17. ARRAY /dev/md0 metadata=1.2 name=openmediavault:StorageRAID UUID=23cc2cfb:b7983a53:710b36db:89d1cc37
      Display All

      Source Code

      1. root@homeserver:~# mdadm --detail --scan --verbose
      2. ARRAY /dev/md0 level=raid5 num-devices=4 metadata=1.2 name=openmediavault:StorageRAID UUID=23cc2cfb:b7983a53:710b36db:89d1cc37
      3. devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde
      4. root@homeserver:~#

      Ok so I have 4x1 gig SATA drives and 1x500 gig boot drive on an old SATA PCI card as the boot volume. I have no more sockets to add a 6th drive so I would need to remove the bad drive and then swap it with the new one. I'm using the latest OMV v4.

      Thanks for the help! I think I need to remove the bad drive and replace it with the new one, then mount the RAID volume and add the disk to the array. Only thing is that I'm not sure how to mount the array with 3 drives to allow mdadm to add the drive.
    • thecheat001 wrote:

      Thanks for the help! I think I need to remove the bad drive and replace it with the new one, then mount the RAID volume and add the disk to the array. Only thing is that I'm not sure how to mount the array with 3 drives to allow mdadm to add the drive.
      Ok which drive do want to replace, because the Raid is clean and active, if a drive had failed it would come up clean/degraded.
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      Ok which drive do want to replace, because the Raid is clean and active

      The drive in question is /dev/sde . As previously mentioned, it's throwing SMART errors . I was working on a process to replace the drive, with screen shots.
      Do you have this one?
      _________________________________________________

      @thecheat001
      What are the smart errors for /dev/sde?

      The following are the attributes where, if they increment, future failure is more likely:
      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.
    • flmaxey wrote:

      The drive in question is /dev/sde . As previously mentioned, it's throwing SMART errors . I was working on a process to replace the drive, with screen shots.
      Do you have this one?
      I don't mind, but the OP says he wants to replace /dev/sde with /dev/sdb after a reboot, but /dev/sdb is already part of the array, hence my question which drive!

      You do pictures :P more than I do :)
      Raid is not a backup! Would you go skydiving without a parachute?
    • Ok, sorry, I was out mowing my lawn... summer is already here.

      I've attached a couple pictures, one of the disks themselves and the specific stats of the failing drive.

      When I remove the failing disk, /dev/sde, and insert the new drive, the new drive mounts at /dev/sdb and everything seems to move down a letter. Or maybe it doesn't, I was in total damage control mode and was freaking out that I made a giant mistake.

      If I had a free SATA socket or could hot swap the disk I think I would be good, the guide I was following seemed to be fine up till that point. As it is, when I boot back up with the original three drives and the new one, the array doesn't mount and I'm sort of lost at this point.
      Images
      • Capture.PNG

        53.27 kB, 1,313×494, viewed 62 times
      • Capture1.PNG

        52.5 kB, 1,891×364, viewed 56 times
    • thecheat001 wrote:

      Ok, sorry, I was out mowing my lawn... summer is already here.
      You can come and do mine :D

      thecheat001 wrote:

      If I had a free SATA socket or could hot swap the disk I think I would be good,
      Mdadm is not hot swappable unfortunately :) but we'll let @flmaxey decide about your drive, but that looks similar to one of mine. However, for mdadm because the raid is clean and working you have to fail the drive then remove it then add the new one all from the cli.
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      thecheat001 wrote:

      Ok, sorry, I was out mowing my lawn... summer is already here.
      You can come and do mine :D

      thecheat001 wrote:

      If I had a free SATA socket or could hot swap the disk I think I would be good,
      Mdadm is not hot swappable unfortunately :) but we'll let @flmaxey decide about your drive, but that looks similar to one of mine. However, for mdadm because the raid is clean and working you have to fail the drive then remove it then add the new one all from the cli.

      We're trying to escape Florida and move up north, I'm so sick of the heat. 43 years of "year round" summer and I'm done.

      I was following a guide from the shell and it pretty much had me doing that - fail the drive, shut down, remove it and replace with the new drive, reboot, add new drive to array, rebuild. I got stuck on the adding the new drive part because /dev/md0 wasn't mounted.

      If I were to wait until the raid says the drive is bad, would it be a different procedure? More assistance from the GUI? I think I'd like to remove it sooner rather than later though.
    • geaves wrote:

      What you do is fail the drive using mdadm, then remove the drive using mdadm, shutdown, install the new drive, reboot the raid should appear as clean/degraded, then add the new drive using mdadm.
      At least the above is the procedure anyway.


      That's where the issue is, the RAID doesn't mount with 3 drives so I can't add to the dismounted array. Is it possible that the mount points are getting screwed up because the new drive is at /dev/sdb?
    • /dev/sde is dying. There's little doubt about that - it's just a matter of time.

      Since you're on OMV4, swapping the drive out should be possible in the GUI
      ________________________________________________________

      Go into Storage, Disks ,and check/verify /dev/sde against the model, SN#, Vendor and the other details of /dev/sde so you'll know which drive to physically pull.


      Then:



      The result should be: State clean,degraded

      - Shutdown and, using the info gathered on the drive, find and remove /dev/sde and set it aside. (To be safe, to be able to back out if needed, leave it as it is.)
      - Install the new drive (the same size or larger) in the now empty slot.
      - Boot up
      _____________________________________________________________

      OMV should come up with /dev/md0 in the same state (clean, degraded) as it was when shut down. If the array is in the GUI, skip down to the next screen shot.

      If not try the following on the CLI:

      mdadm --assemble --force /dev/md0

      Check the GUI again to see if the array is up in the clean, degraded state. (If you see the array in the GUI, skip to the next screen shot.)
      If it's not there, you could specify the disks to are currently active in the array (3 of them at this point) but first, you'll need to verify the device name for the newly added drive, in Storage, Disks. Don't be surprised if the device name for the new drive is the same as the one you removed.

      You don't want to include the device name for the newly installed drive in the following command line. (I'm going to assume that the new drive has assumed the device name of the old drive, /dev/sde.)

      mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdc /dev/sdd


      If this doesn't work and the array won't assemble, you're at a decision point. The following is what I'd do:
      I would shut down, reinstall the old disk, boot up and see if I could get the array to assemble again. If successful, I would shutdown, go out and buy an external drive (at least 4TB) and an UPS's, if I didn't have one.
      I'd boot up on the UPS, with the new external drive connected, and remove the bad drive from the array again. (Why? The array can run one drive down and all that the dying drive can do is corrupt data as it dies.)
      Mount the external drive and copy the entire array, complete, to the external disk using Rsync. I wouldn't shutdown until this is complete.

      _____________________________________________________________

      If the array is in the GUI with a state of clean, degraded, got back to Storage, Disks, and "wipe" the new drive and note it's device name.
      Then, in the GUI, do the following: (Where the device selected in the dialog is the device name of the new wiped drive)



      You should see, in "State", that the array is recovering. This is going to take awhile and will go faster if you don't access network shares.

      The post was edited 1 time, last by crashtest: edit ().

    • thecheat001 wrote:

      Is it possible that the mount points are getting screwed up because the new drive is at /dev/sdb?
      It shouldn't no, at present don't do this!!

      But what you would do in this order;

      mdadm --manage /dev/md0 --fail /dev/sde if that errors you would do mdadm --stop /dev/md0 then run the command again.

      mdadm --manage /dev/md0 --remove /dev/sde

      That would fail and remove the drive /dev/sde from the array to check that cat /proc/mdstat should return the array as clean/degraded.

      Shutdown, remove the drive you have failed, insert the new drive and reboot.

      From the GUI make a note of the drive letter associated to the drive /dev/sd? Prepare the drive by doing a quick wipe.

      Check the Raid Management the Raid should display clean/degraded.

      From the cli mdadm --manage /dev/md0/ -- /dev/sd? (the ? being the drive letter) if that errors stop the array and run again, then do
      cat /proc/mdstat and mdadm will add it and start to resync the array.

      Raid is not a backup! Would you go skydiving without a parachute?
    • @geaves is correct here. If /dev/sdb is currently operating (and mobo/drive sata connections are not moved around), /dev/sdb shouldn't give up it's working device name. As noted, I wouldn't be surprised if the new drive comes up under the old drives device name (/dev/sde) or another device name that is not in use.

      (I'm not saying it would be impossible but that would be a weird BIOS interaction with the OS.)
    • Oh thank you very much!!!! I ended up getting the old drive out, installing the new drive, then rebooting. The array was still not displaying in the RAID Management tab. I went to the shell and tried to "mdadm --assemble --force /dev/md0 /dev/sdc /dev/sdd /dev/sde" but it said they were busy. Then I stopped /dev/md0 and tried to add it again, which didn't work. So I rebooted and tried to add it again and it mounted the array with three drives and it showed up in the GUI. Success!

      Now I've added the new disk in the GUI and it's rebuilding! Thanks again!
    • It took two boots before it the array came up clean, degraded? (One can never know for sure - there's some many variables.)

      I think the command should have been:
      mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdc /dev/sdd because the drive removed was /dev/sde (But we won't quibble. :) )

      Just curious, what was the device name for the new drive? What did new drive come up as - was it /dev/sde ?