RAID5 gone after swapping defective drive

  • Hey everyone,


    I've already seen that I'm not the only one with this kind of problem but I don't want to hijack other threads so I'm making my own.
    I recently started getting emails with an increasing error log on one of my drive which resulted in an "OfflineUncorrectableSector" so I turned the system off, waited for the new drive to arrive and plugged it in today.
    After starting up again I got the following email:



    And when I wanted to rebuild the raid it wasn't showing in the Raid section, though all drives are visible in the Drives section.


    Below some outputs that I found in another thread to provide, though please treat me as a Linux noob, I'm still learning :)
    I have 3x 2TB WD red


    cat /proc/mdstat


    Code
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : inactive sdc[2](S) sdb[1](S)
    5860271024 blocks super 1.2
    unused devices: <none>


    blkid


    Code
    /dev/sdb: UUID="25f41e2c-c766-3a75-462d-a5746b47a522" UUID_SUB="cb760801-4799-3a1d-5a12-60d9d7e07abf" LABEL="NAS:Raid5x1" TYPE="linux_raid_member"
    /dev/sdc: UUID="25f41e2c-c766-3a75-462d-a5746b47a522" UUID_SUB="c2b22e85-6da0-f2d1-806a-b3b6c54cc381" LABEL="NAS:Raid5x1" TYPE="linux_raid_member"
    /dev/sdd1: UUID="e9cc3846-3bd3-4099-8f55-ff16e09e4c32" TYPE="ext4" PARTUUID="000df838-01"
    /dev/sdd5: UUID="de8db28c-13c5-408f-9ce1-9c3ddc625c4a" TYPE="swap" PARTUUID="000df838-05"


    fdisk -l | grep Disk

    Code
    The primary GPT table is corrupt, but the backup appears OK, so that will be used.Disk /dev/sda: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
    Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
    Partition 1 does not start on physical sector boundary.
    Disk /dev/sdc: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
    Disk identifier: D305B4BA-D562-4DEE-9B34-8EA95FBC8337
    Disk /dev/sdd: 28 GiB, 30016659456 bytes, 58626288 sectors
    Disk identifier: 0x000df838


    cat /etc/mdadm/mdadm.conf




    mdadm --detail --scan --verbose


    Code
    INACTIVE-ARRAY /dev/md127 num-devices=2 metadata=1.2 name=NAS:Raid5x1 UUID=25f41e2c:c7663a75:462da574:6b47a522
    devices=/dev/sdb,/dev/sdc



    I hope someone can help me :)

  • mdadm, software raid is not hot swap aware you have to tell it what to do.


    So currently your array is inactive with /dev/sd[bc] as 2 drives from your 3 drive Raid 5.


    If the new drive shows under storage -> disks select it then wipe from the menu (short wipe will be sufficient)


    Then run mdadm --stop /dev/md127 then mdadm --add /dev/md127 /dev/sda I will assume that sda is the new drive.


    If that returns no errors mdadm --assemble --verbose --force /dev/md127 /dev/sd[abc]


    Check cat /proc/mdstat to check the progress, when finished omv-mkconf mdadm

  • Hi,
    thanks for your quick reply, sadly I only got to the second step.
    Wiping and stopping the raid worked but I couldn't run the -add command


    Code
    root@NAS:~# mdadm --stop /dev/md127
    mdadm: stopped /dev/md127
    root@NAS:~# mdadm --add /dev/md127 /dev/sda
    mdadm: error opening /dev/md127: No such file or directory
  • yup, sda should be the new drive, at least according to the S/N in the web interface. sdd is the boot SSD.


    Screenshot for easier reading because colours

  • Ok this does not look good! sdb -> line 7 whilst not a problem should be resolved sdc shows a reserved microsoft partition which would imply it was never prepared correctly for OMV.


    Do you have a back up?

  • None that's up to date, but I could pull a new one if I plugged in the old hdd, then the raid is still recognised and everything can be accessed. So weird it's doing this now, I had a faulty drive before and rebuilding with the new one worked just as intended. the new one back then is the one that's now dying. So after I do that I just wipe everything and rebuild from scratch?

  • You could put the old drive back, the array should/may come up clean/degraded or clean either way get a back up done at least of everything important.


    Then remove the failing drive from the array using delete on the menu, in the GUI it's important to make a note of each drive reference i.e. sda, sdb, sdc and use that with the information in Storage -> Disks which gives each drives Model and Serial Number that way you don't pull the wrong drive.


    Then come back, it should be possible to sort this out, but it may require the array re sync a few times.

  • Thanks, this seemed to work. The raid is currently resyncing the data to the new drive, though since I have an up to date backup I'm contemplating just wiping everything to set it up again properly.

  • The raid is currently resyncing the data to the new drive,

    :thumbsup::thumbsup: the second option is up to you, but if you start again you'll have to remove the SMB shares, then remove the shares you created, then remove the drives from the array, then delete the array -> so reverse of setting it up.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!