OMV Raid Missing after reboot

zerozenit · 27. August 2020

Hi,

on OMV Arrakis I have two RAIDs, one consisting of 2 WD Red 4TB drives and another consisting of 4 WD Red 12TB drives. Since I was not using the Raid 2x4 I turned off OMV and disconnected the power to the two disks, after which I restarted the system. After the restart, the 4x12 Raid was missing. So I turned off and reconnected the two discs of the Raid 2x4 and restarted but the Raid 4x12 is always missing.

Code

# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md1 : active (auto-read-only) raid1 sdd[0] sde[1]
      3906887488 blocks super 1.2 [2/2] [UU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

md0 : inactive sdg[3](S) sda[0](S) sdb[1](S) sdf[2](S)
      46875017216 blocks super 1.2
       
unused devices: <none>


# blkid
/dev/sda: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="3904f2f1-fe1f-bde3-a965-d9dbe0074f66" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
/dev/sdb: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="a6bb8aa8-4e9b-7f90-b105-45a9301acbce" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
/dev/sdc1: UUID="2218-DC43" TYPE="vfat" PARTUUID="09f69470-ba7b-4b6b-9456-c09f4c6ad2ee"
/dev/sdc2: UUID="87bfca96-9bee-4725-ae79-d8d7893d5a49" TYPE="ext4" PARTUUID="3c45a8f0-3106-4ba8-89bc-b15d22e81144"
/dev/sdc3: UUID="856b0ba6-a0a9-49f2-81ef-27e24004aa98" TYPE="swap" PARTUUID="fda4b444-cf82-4ae8-b916-01b8244acee3"
/dev/sdd: UUID="e53febe0-a88a-a971-851c-4af862b103fa" UUID_SUB="d6c2165d-f0e4-b363-eb1a-df67419c277c" LABEL="pandora:Raid2x4TBWdRed" TYPE="linux_raid_member"
/dev/sde: UUID="e53febe0-a88a-a971-851c-4af862b103fa" UUID_SUB="e62e126a-76b3-c909-1d6a-682ad3bda132" LABEL="pandora:Raid2x4TBWdRed" TYPE="linux_raid_member"
/dev/sdf: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="6c9c5433-6838-c39f-abfa-7807205a3238" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
/dev/md1: LABEL="REDRAID2X4" UUID="37a6c227-c4a2-4399-bd26-c3f860a4c977" TYPE="ext4"
/dev/sdg: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="00d6eaa4-5773-6e81-97e1-1d7b184335cb" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"


# fdisk -l | grep "Disk "
Disk /dev/sda: 10,9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdb: 10,9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdc: 28,7 GiB, 30752636928 bytes, 60063744 sectors
Disk identifier: 51328880-3F36-4C4F-A18D-76E5CF56DD7D
Disk /dev/sdd: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk /dev/sde: 3,7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk /dev/sdf: 10,9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdg: 10,9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/md1: 3,7 TiB, 4000652787712 bytes, 7813774976 sectors


# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 name=pandora:Raid4x12TBWdRed UUID=8b767a7d:c52c068d:c04f1a3c:fd8d4c5f
ARRAY /dev/md1 metadata=1.2 name=pandora:Raid2x4TBWdRed UUID=e53febe0:a88aa971:851c4af8:62b103fa


# mdadm --detail --scan --verbose
INACTIVE-ARRAY /dev/md0 num-devices=4 metadata=1.2 name=pandora:Raid4x12TBWdRed UUID=8b767a7d:c52c068d:c04f1a3c:fd8d4c5f
   devices=/dev/sda,/dev/sdb,/dev/sdf,/dev/sdg
ARRAY /dev/md1 level=raid1 num-devices=2 metadata=1.2 name=pandora:Raid2x4TBWdRed UUID=e53febe0:a88aa971:851c4af8:62b103fa
   devices=/dev/sdd,/dev/sde

Alles anzeigen

How can I solve?

Thank you very much.

zerozenit

geaves · 27. August 2020

Zitat von zerozenit

So I turned off and reconnected the two discs of the Raid 2x4 and restarted but the Raid 4x12 is always missing.

Interesting, odd behaviour;

Your mirror is active but in auto-read-only, this -> mdadm --readwrite /dev/md1 should correct that.

/dev/md0 the 4x12TB is inactive so you'll have to stop it before reassembling;

mdadm --stop /dev/md0 wait for confirmation before continuing

mdadm --assemble --force --verbose /dev/md0 /dev/sd[abfg]

An array usually only becomes inactive if there has been a power failure, the system has not been shut down gracefully or a drive has been removed for replacement.

Before you ask I have no idea what has caused this odd behaviour, but it could be hardware related.

zerozenit · 28. August 2020

Zitat von geaves

Interesting, odd behaviour;

Your mirror is active but in auto-read-only, this -> mdadm --readwrite /dev/md1 should correct that.

/dev/md0 the 4x12TB is inactive so you'll have to stop it before reassembling;

mdadm --stop /dev/md0 wait for confirmation before continuing

mdadm --assemble --force --verbose /dev/md0 /dev/sd[abfg]

An array usually only becomes inactive if there has been a power failure, the system has not been shut down gracefully or a drive has been removed for replacement.

Before you ask I have no idea what has caused this odd behaviour, but it could be hardware related.

Alles anzeigen

Hi geaves,

Thanks, you were very helpful, I reassembled and this is the output:

Code

~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[abfg]
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 3.
mdadm: forcing event count in /dev/sda(0) from 509380 upto 510790
mdadm: forcing event count in /dev/sdb(1) from 509380 upto 510790
mdadm: added /dev/sdb to /dev/md0 as 1
mdadm: added /dev/sdf to /dev/md0 as 2
mdadm: added /dev/sdg to /dev/md0 as 3 (possibly out of date)
mdadm: added /dev/sda to /dev/md0 as 0
mdadm: /dev/md0 has been started with 3 drives (out of 4).

Alles anzeigen

What do you advise me to do? The removed sdg disk appears to be in good condition.

Also I would like to turn off md1 and remove (physically) the power to the two 4TB disks, what is the correct procedure to do this?

Thank you

geaves · 28. August 2020

Zitat von zerozenit

Also I would like to turn off md1 and remove (physically) the power to the two 4TB disks, what is the correct procedure to do this

What do you advise me to do? The removed sdg disk appears to be in good condition.

To remove md1;

Storage Manangement -> File Systems select the array md1 on the menu click Unmount then Apply when it appears

Raid Management from the menu select the array md1 on the menu click Delete then Apply when it appears

That should remove md1 from your system and you can remove the power from the drives, or go to Storage -> Disks select each drive in turn from the array and wipe them, then remove the power.

Please ensure you select the correct array to be removed

This line mdadm: added /dev/sdg to /dev/md0 as 3 (possibly out of date) from the output that usually suggests a missing superblock, run mdadm --detail /dev/md0 at the end of that output it should display that drive as removed, but show the other 3 as active sync.

If that's the case go to Storage -> Disks select the drive sdg click wipe on the menu and select short, when complete, Raid Management select the array md0, on the menu click recover, from the dialog box select the drive you've just wiped click OK and the raid should rebuild. Due to the size of the drives this will take some time.

Going back to your first post and re reading, removing the power from md1 then re starting the server threw a curve ball at mdadm, it had no idea where that array had gone and therefore started the errors you now see.

BTW in case any of this goes pear shaped do you have a backup (pear shaped = goes wrong)

geaves · 28. August 2020

EDIT: Forgot to add re the removal of md1, you should delete any smb shares, then shared folders before deleting the array completely,

zerozenit · 29. August 2020

Hi geaves,

i rebuilt the raid, the process ran smoothly and the raid md0 was finally clean again. After following your instructions for md1 I turned off the server and removed the power to the two md1 disks.

I thought it was all fixed, but when I went to reboot the server I noticed that problems booting, it seems there are problems starting the file system of md0, I attach the screenshot.

I am really sad and afraid for my data, I hope you can help me solve this situation too.

Thank you.

geaves · 29. August 2020

Zitat von zerozenit

I am really sad and afraid for my data

You should be, it will either recover or you will lose the lot

The error states that it failed to complete a file system check on that array, I'm assuming although not completely sure that led to the other dependency errors.

How to check or look for the cause is in the actual error, the same goes for the emergency mode error, Control-D will get you to a login where you would login in as root then run joutnalctl -xb and look for errors. You could just try to reboot from that login and see if the system corrects itself.

Other options are

booting from a systemrescuecd

disconnect the array and install OMV5 on another USB Flash Drive, then connect the array once the system is set up

My concern here is that there could one possibly two failing drives which is preventing fsck from running, you have no way of knowing until you can locate the specific error.

Your new problem is somewhat out of my comfort zone as I have never had this problem, but if this gets sorted you may want to consider another option other than Raid5 with 12TB drives!

zerozenit · 31. August 2020

Zitat von geaves

You should be, it will either recover or you will lose the lot

Hi geaves,

I fixed the file system problems using the manual command a couple of times:

Code

fsck /dev/md0

All the important data is still there, I only had problems with some Docker containers, but I quickly rebuilt them.

Now everything runs very well and the system seems very stable.

Many thanks for your fruitful help, I really appreciate it!

geaves · 31. August 2020

Zitat von zerozenit

Many thanks for your fruitful help, I really appreciate it!

You're welcome, pleased it wasn't failing hardware and you're back running

OMV Raid Missing after reboot

zerozenit 29. August 2020

zerozenit 1. September 2020

Jetzt mitmachen!

Tags