RAID5 gone after swapping defective drive

Jayf · 9. August 2019

Hey everyone,

I've already seen that I'm not the only one with this kind of problem but I don't want to hijack other threads so I'm making my own.
I recently started getting emails with an increasing error log on one of my drive which resulted in an "OfflineUncorrectableSector" so I turned the system off, waited for the new drive to arrive and plugged it in today.
After starting up again I got the following email:

Zitat von monit alert -- Status failed mountpoint_media_...

Status failed Service mountpoint_media_0ccf9178-985e-4e03-a859-e717a89a20dd

Date: Fri, 09 Aug 2019 17:50:43
Action: alert
Host: NAS
Description: status failed (1) -- /media/0ccf9178-985e-4e03-a859-e717a89a20dd is not a mountpoint

Your faithful employee,
Monit

Alles anzeigen

And when I wanted to rebuild the raid it wasn't showing in the Raid section, though all drives are visible in the Drives section.

Below some outputs that I found in another thread to provide, though please treat me as a Linux noob, I'm still learning
I have 3x 2TB WD red

cat /proc/mdstat

Code

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdc[2](S) sdb[1](S)
      5860271024 blocks super 1.2


unused devices: <none>

blkid

Code

/dev/sdb: UUID="25f41e2c-c766-3a75-462d-a5746b47a522" UUID_SUB="cb760801-4799-3a1d-5a12-60d9d7e07abf" LABEL="NAS:Raid5x1" TYPE="linux_raid_member"
/dev/sdc: UUID="25f41e2c-c766-3a75-462d-a5746b47a522" UUID_SUB="c2b22e85-6da0-f2d1-806a-b3b6c54cc381" LABEL="NAS:Raid5x1" TYPE="linux_raid_member"
/dev/sdd1: UUID="e9cc3846-3bd3-4099-8f55-ff16e09e4c32" TYPE="ext4" PARTUUID="000df838-01"
/dev/sdd5: UUID="de8db28c-13c5-408f-9ce1-9c3ddc625c4a" TYPE="swap" PARTUUID="000df838-05"

fdisk -l | grep Disk

Code

The primary GPT table is corrupt, but the backup appears OK, so that will be used.Disk /dev/sda: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors


Partition 1 does not start on physical sector boundary.
Disk /dev/sdc: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk identifier: D305B4BA-D562-4DEE-9B34-8EA95FBC8337
Disk /dev/sdd: 28 GiB, 30016659456 bytes, 58626288 sectors
Disk identifier: 0x000df838

cat /etc/mdadm/mdadm.conf

Code: cat /etc/mdadm/mdadm.conf

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md/Raid5x1 metadata=1.2 spares=0 name=NAS:Raid5x1 UUID=25f41e2c:c7663a75:462da574:6b47a522


# instruct the monitoring daemon where to send mail alerts
MAILADDR my.email@addre.ss

Alles anzeigen

mdadm --detail --scan --verbose

Code

INACTIVE-ARRAY /dev/md127 num-devices=2 metadata=1.2 name=NAS:Raid5x1 UUID=25f41e2c:c7663a75:462da574:6b47a522
   devices=/dev/sdb,/dev/sdc

I hope someone can help me

Jayf · 9. August 2019

Addition:
Just realised that my new drive can be found as "/dev/sda" on the web interface, but in none of the outputs I can see it like that

geaves · 9. August 2019

mdadm, software raid is not hot swap aware you have to tell it what to do.

So currently your array is inactive with /dev/sd[bc] as 2 drives from your 3 drive Raid 5.

If the new drive shows under storage -> disks select it then wipe from the menu (short wipe will be sufficient)

Then run mdadm --stop /dev/md127 then mdadm --add /dev/md127 /dev/sda I will assume that sda is the new drive.

If that returns no errors mdadm --assemble --verbose --force /dev/md127 /dev/sd[abc]

Check cat /proc/mdstat to check the progress, when finished omv-mkconf mdadm

Jayf · 9. August 2019

Hi,
thanks for your quick reply, sadly I only got to the second step.
Wiping and stopping the raid worked but I couldn't run the -add command

Code

root@NAS:~# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
root@NAS:~# mdadm --add /dev/md127 /dev/sda
mdadm: error opening /dev/md127: No such file or directory

geaves · 9. August 2019

Zitat von Jayf

Wiping and stopping the raid worked but I couldn't run the -add command

Ok is sda the new drive? output of fdisk -l

Jayf · 9. August 2019

yup, sda should be the new drive, at least according to the S/N in the web interface. sdd is the boot SSD.

Code

Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes




The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdc: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D305B4BA-D562-4DEE-9B34-8EA95FBC8337


Device     Start    End Sectors  Size Type
/dev/sdc1     34 262177  262144  128M Microsoft reserved


Partition 1 does not start on physical sector boundary.




Disk /dev/sdd: 28 GiB, 30016659456 bytes, 58626288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000df838


Device     Boot    Start      End  Sectors  Size Id Type
/dev/sdd1  *        2048 56145919 56143872 26,8G 83 Linux
/dev/sdd2       56147966 58626047  2478082  1,2G  5 Extended
/dev/sdd5       56147968 58626047  2478080  1,2G 82 Linux swap / Solaris




Disk /dev/sda: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Alles anzeigen

Screenshot for easier reading because colours

geaves · 9. August 2019

Ok this does not look good! sdb -> line 7 whilst not a problem should be resolved sdc shows a reserved microsoft partition which would imply it was never prepared correctly for OMV.

Do you have a back up?

Jayf · 9. August 2019

None that's up to date, but I could pull a new one if I plugged in the old hdd, then the raid is still recognised and everything can be accessed. So weird it's doing this now, I had a faulty drive before and rebuilding with the new one worked just as intended. the new one back then is the one that's now dying. So after I do that I just wipe everything and rebuild from scratch?

geaves · 9. August 2019

You could put the old drive back, the array should/may come up clean/degraded or clean either way get a back up done at least of everything important.

Then remove the failing drive from the array using delete on the menu, in the GUI it's important to make a note of each drive reference i.e. sda, sdb, sdc and use that with the information in Storage -> Disks which gives each drives Model and Serial Number that way you don't pull the wrong drive.

Then come back, it should be possible to sort this out, but it may require the array re sync a few times.

Jayf · 9. August 2019

should I pull a backup before that? Or is there no risk of destroying everything? Then I could start right away

geaves · 9. August 2019

Zitat von Jayf

Or is there no risk of destroying everything?

The array should come back just by assembling it, it should come up clean/degraded, mdadm --assemble --verbose --force /dev/md127 /dev/sd[bc]

The back up.

Jayf · 14. August 2019

Thanks, this seemed to work. The raid is currently resyncing the data to the new drive, though since I have an up to date backup I'm contemplating just wiping everything to set it up again properly.

geaves · 14. August 2019

Zitat von Jayf

The raid is currently resyncing the data to the new drive,

the second option is up to you, but if you start again you'll have to remove the SMB shares, then remove the shares you created, then remove the drives from the array, then delete the array -> so reverse of setting it up.

Jetzt mitmachen!