Raid 5 array missing after failed rebuild

zachlow · 19. März 2020

My raid 5 array had a drive failure. I installed new hard drive and attempted a rebuild. The rebuild failed and I let the server sit for a day because I had run out of time to work on it. I came back the next day and the array was missing from the webGUI. I am using 3 WD red 3TB drives and 3 Seagate Ironwolf 3tb drives. Thanks in advance!

Here is the info requested:

Code

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdc[1] sdb[7](S) sdg[5] sdf[6] sdd[2]
14650677560 blocks super 1.2


unused devices: <none>


$  blkid
-dash: 2: blkid: not found


$ fdisk -l | grep "Disk "
-dash: 3: fdisk: not found


$ cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 spares=1 name=NAS:NASvol1 UUID=9a74f8dd:30a95450:999e44c3:e36af552


# instruct the monitoring daemon where to send mail alerts
MAILADDR [email='zachlow77@gmail.com'][/email]
MAILFROM root


$ mdadm --detail --scan --verbose
-dash: 7: mdadm: not found

Alles anzeigen

geaves · 19. März 2020

Zitat von zachlow

Thanks in advance!

What user are you using to ssh into your server? it should be root the blkid, fdisk, and mdadm not found would suggest it's a 'user'

I started using OMV from v3 but the principles should be the same.

zachlow · 19. März 2020

When I try to login using root it says permission denied public key password. So I used a user that had root access to login

geaves · 19. März 2020

Zitat von zachlow

So I used a user that had root access to login

you mean a user that belongs to the ssh group big difference, anyway when you login enter su press enter it should ask for a password type in the password for your root user.

zachlow · 19. März 2020

That worked! Here's the output now.

Code

root@NAS:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdc[1] sdb[7](S) sdg[5] sdf[6] sdd[2]
      14650677560 blocks super 1.2


unused devices: <none>
root@NAS:~# blkid
/dev/sda1: UUID="f7394bdd-cf50-47f0-99c1-58780a3d5c86" TYPE="ext4" 
/dev/sda5: UUID="82b05653-a32e-44e3-a814-917fbcae3d43" TYPE="swap" 
/dev/sdc: UUID="9a74f8dd-30a9-5450-999e-44c3e36af552" UUID_SUB="22007f01-db25-482f-048a-9ecb92b0b230" LABEL="NAS:NASvol1" TYPE="linux_raid_member" 
/dev/sdd: UUID="9a74f8dd-30a9-5450-999e-44c3e36af552" UUID_SUB="aba0a944-bfd2-23ab-43cb-4863269e6b20" LABEL="NAS:NASvol1" TYPE="linux_raid_member" 
/dev/sde: UUID="9a74f8dd-30a9-5450-999e-44c3e36af552" UUID_SUB="954f5e15-55db-8cea-c7fe-bd99df8e2957" LABEL="NAS:NASvol1" TYPE="linux_raid_member" 
/dev/sdf: UUID="9a74f8dd-30a9-5450-999e-44c3e36af552" UUID_SUB="b0c703e2-312e-2851-1c94-2bd4b04fd988" LABEL="NAS:NASvol1" TYPE="linux_raid_member" 
/dev/sdg: UUID="9a74f8dd-30a9-5450-999e-44c3e36af552" UUID_SUB="6966fbb8-8745-f6ad-f456-218fb598202a" LABEL="NAS:NASvol1" TYPE="linux_raid_member" 
/dev/sdb: UUID="9a74f8dd-30a9-5450-999e-44c3e36af552" UUID_SUB="3780c559-3151-073e-86ea-bd4f0df7d245" LABEL="NAS:NASvol1" TYPE="linux_raid_member" 
root@NAS:~# fdisk -l | grep "Disk "
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sde doesn't contain a valid partition table
Disk /dev/sdf doesn't contain a valid partition table
Disk /dev/sdg doesn't contain a valid partition table
Disk /dev/sda: 500.1 GB, 500107862016 bytes
Disk identifier: 0x0007c7b6
Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sdd: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sde: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
root@NAS:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 spares=1 name=NAS:NASvol1 UUID=9a74f8dd:30a95450:999e44c3:e36af552


# instruct the monitoring daemon where to send mail alerts
MAILADDR zachlow77@gmail.com
MAILFROM root
root@NAS:~# mdadm --detail --scan --verbose
ARRAY /dev/md0 level=raid5 num-devices=6 metadata=1.2 spares=1 name=NAS:NASvol1 UUID=9a74f8dd:30a95450:999e44c3:e36af552
   devices=/dev/sdc,/dev/sdd,/dev/sdf,/dev/sdg,/dev/sdb

Alles anzeigen

geaves · 19. März 2020

Don't like the output from that fdisk also the mdadm definitions give no information regarding the number of drives, just spares=1 also it shows /dev/sde as a linix raid member, but no reference in mdstat.

mdadm --stop /dev/md0

mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcdfg]

'If' those work it should come back up as clean degraded. Have you a backup

zachlow · 19. März 2020

I had a backup but the backup drive died about a week ago and I haven't gotten around to replacing it. It looks like 1 drive isn't cooperating. I have another drive I can replace it with if that would help.

Code

root@NAS:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@NAS:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcdfg]
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 4.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: added /dev/sdd to /dev/md0 as 2
mdadm: added /dev/sdf to /dev/md0 as 3
mdadm: added /dev/sdg to /dev/md0 as 4
mdadm: no uptodate device for slot 5 of /dev/md0
mdadm: added /dev/sdb to /dev/md0 as -1
mdadm: added /dev/sdc to /dev/md0 as 1
mdadm: /dev/md0 assembled from 4 drives and 1 spare - not enough to start the array.

Alles anzeigen

geaves · 19. März 2020

Zitat von zachlow

mdadm: /dev/md0 assembled from 4 drives and 1 spare - not enough to start the array.

That's why it won't start the array, your Raid 5 has 6 drives, you replaced one and during the rebuild something went wrong, so you only have 5 drives supposedly working within that array.

As Raid 5 only allows for one drive failure you've already used that option by adding a new drive which failed to be added to the array although it's marked as a raid member.

However, the output of 'slot -1' I've never come across before, but for whatever reason it's unable to read slot 0 or slot 5, this could be an issue with each slot which would suggest m'board or the sata cable/s.

zachlow · 19. März 2020

Sounds like I'm SOL then. Kinda had a feeling when I checked the SMART status of the drives after the rebuild failed and saw a drive had recently developed bad sectors. Also I can see all 6 drives in the webGUI under physical disks.

geaves · 19. März 2020

Zitat von zachlow

Also I can see all 6 drives in the webGUI under physical disks.

You will, and that's the only good part, you know that 4 of the drives can be found what's the output of mdadm --detail /dev/md0

zachlow · 19. März 2020

Code

root@NAS:~# mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.

geaves · 19. März 2020

Doh!! I forgot that needs the array working, this is what I was looking for;

Spoiler anzeigen

this is from a VM for OMV4 it tells me the three drives in the array and which ones they are, if there was a spare it would give that information as well.

That's no help to you, what I'm trying to ascertain is which drive is the spare and why it's being registered as a spare. Will mdadm --examine /dev/md0 output anything.

zachlow · 19. März 2020

No output.

geaves · 19. März 2020

Zitat von zachlow

No output.

Ok the array is still stopped try this mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcdefg] that includes the drive /dev/sde which I assume is the new one.

zachlow · 19. März 2020

root@NAS:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcdefg]

mdadm: looking for devices for /dev/md0

mdadm: /dev/sdb is busy - skipping

mdadm: /dev/sdc is busy - skipping

mdadm: /dev/sdd is busy - skipping

mdadm: /dev/sde is busy - skipping

mdadm: /dev/sdg is busy - skipping

mdadm: /dev/md0 is already in use.

geaves · 19. März 2020

That suggests it's running, cat /proc/mdstat

zachlow · 19. März 2020

Here's that

root@NAS:~# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]

md0 : inactive sdc[1] sdb[7](S) sde[5] sdg[6] sdd[2]

14650677560 blocks super 1.2

I tried mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcdefg] again and got this. Also sdb is the new drive that I replaced.

root@NAS:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcdefg]

mdadm: looking for devices for /dev/md0

mdadm: /dev/sdb is identified as a member of /dev/md0, slot -1.

mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.

mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.

mdadm: /dev/sde is identified as a member of /dev/md0, slot 4.

mdadm: /dev/sdf is identified as a member of /dev/md0, slot 5.

mdadm: /dev/sdg is identified as a member of /dev/md0, slot 3.

mdadm: forcing event count in /dev/sdf(5) from 22765 upto 23272

mdadm: clearing FAULTY flag for device 4 in /dev/md0 for /dev/sdf

mdadm: Marking array /dev/md0 as 'clean'

mdadm: no uptodate device for slot 0 of /dev/md0

mdadm: added /dev/sdd to /dev/md0 as 2

mdadm: added /dev/sdg to /dev/md0 as 3

mdadm: added /dev/sde to /dev/md0 as 4

mdadm: added /dev/sdf to /dev/md0 as 5

mdadm: added /dev/sdb to /dev/md0 as -1

mdadm: added /dev/sdc to /dev/md0 as 1

mdadm: /dev/md0 assembled from 5 drives and 1 spare - not enough to start the array.

root@NAS:~# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]

md0 : inactive sdc[1](S) sdb[7](S) sdf[4](S) sde[5](S) sdg[6](S) sdd[2](S)

17580813072 blocks super 1.2

geaves · 19. März 2020

This is making no sense, this would suggest the array has started mdadm: Marking array /dev/md0 as 'clean' but then you get this mdadm: /dev/md0 assembled from 5 drives and 1 spare - not enough to start the array, followed by inactive in mdstat.

It's the no uptodate device for slot 0, I wonder if -1 is slot 0 which is /dev/sdb

OK mdadm --stop /dev/md0 then mdadm --assemble --force --verbose /dev/md0 /dev/sd[cdefg] and it looks as if there could be a problem with /dev/sdf

zachlow · 19. März 2020

root@NAS:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[cdefg]

mdadm: looking for devices for /dev/md0

mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.

mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.

mdadm: /dev/sde is identified as a member of /dev/md0, slot 4.

mdadm: /dev/sdf is identified as a member of /dev/md0, slot 5.

mdadm: /dev/sdg is identified as a member of /dev/md0, slot 3.

mdadm: no uptodate device for slot 0 of /dev/md0

mdadm: added /dev/sdd to /dev/md0 as 2

mdadm: added /dev/sdg to /dev/md0 as 3

mdadm: added /dev/sde to /dev/md0 as 4

mdadm: added /dev/sdf to /dev/md0 as 5

mdadm: added /dev/sdc to /dev/md0 as 1

mdadm: /dev/md0 has been started with 5 drives (out of 6).

You sir are a genius! Its now showing in the webgui as clean, degraded.

geaves · 19. März 2020

OK at this moment the drive /dev/sdb is the issue, so slot -1 should be slot 0, this is either a failing sata port on the m'board or a bad cable.

What concerns me we can attempt to use the WebUi to wipe that drive and then add it to the array, but that could also put you back to square one + I think /dev/sdf may be failing or at least beginning too.

Raid 5 array missing after failed rebuild

zachlow 19. März 2020

Jetzt mitmachen!