RAID 5 not available 2 of 3 HDDs missing

mafi · 11. Januar 2020

Hi,

maybe you could give me some advice to bring my storage back to live?

I use a OMV 2.2.14, wihch was running since 2016 without problems.

The RAID5 presists of 3x3TB WD RED

Drive configuration:
/dev/sda: system and swap
/dev/sdb - /dev/sdd RAID5 array

I didn't do anythig special when the array suddenly stopped working.

As I don't need the storage permanently, I start the OMV via WOL on demand and use Autoshutdown.

Checking the OMV GUI, SMART reports for /dev/sdc:
'Device has a vew bad sectors'
All other devices are 'Good'

Here some information that may help to understand the problem:

root@OpenMediaVault:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdd[2]
2930135512 blocks super 1.2

unused devices: <none>
root@OpenMediaVault:~# blkid
/dev/sda1: UUID="439440a1-b67d-44a9-9569-5d05be5eb1d4" TYPE="ext4"
/dev/sda5: UUID="59834cdd-708d-4ba9-9a59-1f1d6a5ee2cf" TYPE="swap"
/dev/sdd: UUID="7c016f86-f8df-c2fe-7be1-d809b79b6f1e" UUID_SUB="8d28819a-df0b-1da1-f7c1-0143ff15a021" LABEL="OpenMediaVault:RAID5" TYPE="linux_raid_member"
/dev/sdb: UUID="7c016f86-f8df-c2fe-7be1-d809b79b6f1e" UUID_SUB="d92491d5-94b1-0b92-a7a3-c28e53dcec4b" LABEL="OpenMediaVault:RAID5" TYPE="linux_raid_member"
/dev/sdc: UUID="7c016f86-f8df-c2fe-7be1-d809b79b6f1e" UUID_SUB="750ce038-2cce-d625-4ae3-4bffa812f4d4" LABEL="OpenMediaVault:RAID5" TYPE="linux_raid_member"
root@OpenMediaVault:~# fdisk -l | grep "Disk "
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sda: 250.1 GB, 250059350016 bytes
Disk identifier: 0x000307a0
Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
Disk /dev/sdd: 3000.6 GB, 3000592982016 bytes
Disk identifier: 0x00000000
root@OpenMediaVault:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 name=OpenMediaVault:RAID5 UUID=7c016f86:f8dfc2fe:7be1d809:b79b6f1e

My plan to solve the problem is to replace /dev/sdc (the drive with bad sectors) with a new (WD RED 4TB) disk and start a reassemby by:
mdadm --stop /dev/md0

mdadm --assemble --force --verbose /dev/md0 /dev/sd[dcb]

Do you agree that this is the best plan?

Thanks in advance,

Markus

geaves · 11. Januar 2020

Zitat von mafi

Do you agree that this is the best plan?

No, you can't just replace a drive without first removing the defective one from the raid, not physically but from mdadm.

But the raid could be completely dead, all data lost, cat /proc/mdstat shows the raid as inactive but displays only one drive that in itself would suggest the raid has gone, blkid shows no raid and fdisk can't find a partition table on any of the drives within the raid.

What you could try without removing any drives is to run your two suggested commands and see what happens, any errors and the raids dead and your data gone.

mafi · 11. Januar 2020

OK, this is something that I don't want to hear.

I assume, as long as I don't try to reassemble the raid, the data is still there, so maybe I could try something different before I play my last card?

What I didn't understand up to now is, what happens if one of the dirves gets damaged? Does the raid stop working and I get a message?

geaves · 11. Januar 2020

Zitat von mafi

OK, this is something that I don't want to hear.

Sorry, a Raid5 can lose one drive and not lose data, as I said your cat /proc/mdstat is seeing only one out of the three.

As I said you could try mdadm --stop the mdadm --assemble as you suggested, but on the three drives currently installed. If there is an error then post it here, but prepare for the worst, when it comes to Raid I'm a pessimist

mafi · 18. Januar 2020

Hi geaves,

after fighting some time with myself, I decided to give it a try:

root@OpenMediaVault:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0

root@OpenMediaVault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[dcb]
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: forcing event count in /dev/sdb(0) from 406 upto 408
mdadm: forcing event count in /dev/sdc(1) from 406 upto 408
mdadm: added /dev/sdc to /dev/md0 as 1
mdadm: added /dev/sdd to /dev/md0 as 2
mdadm: added /dev/sdb to /dev/md0 as 0
mdadm: /dev/md0 has been started with 3 drives.

root@OpenMediaVault:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid5 sdb[0] sdd[2] sdc[1]
5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
resync=PENDING
unused devices: <none>

Does 'resync=PENDING' mean that the resync is now running in the background, or do I have to do something else to start the resync?

Regards,
Markus

geaves · 18. Januar 2020

Zitat von mafi

Does 'resync=PENDING' mean that the resync is now running in the background, or do I have to do something else to start the resync?

I could say it's going into nuclear melt down but that's me being sarcastic

Do me one favour when you execute a command and you want to post the output use the spoiler option on the menu bar (second from right) paste each command output in a new spoiler window, it makes it easier to read

But to answer to your question, no, it's doing nothing because it's an auto-read-only state, try the following from the cli

mdadm --readwrite /dev/md0

if that errors you may have to stop the array first, then run that.

mafi · 19. Januar 2020

Hi geaves,

the nuclear melt down didn't happen, instead, after about 10 houres of resyncing, all data seem to be back again!

Thank you very much for your help

Now, to avoid that stress for the future, I think I should improve my backup strategy.

Kind Regards,
Markus

Jetzt mitmachen!