Newb: Power cut borked my RAID

smartin · 7. April 2021

Hi,

Always grateful for your help here... I was doing so well...

We had a power cut the other night and my UPS didn't last long enough to catch things...

Now my RAID seems to be in trouble. By email I'm getting: (Zulu is the name of my RAID...)

Code

The system monitoring needs your attention.

Host:        \openmediavault.local
Date:        Wed, 07 Apr 2021 08:52:04
Service:     filesystem_srv_dev-disk-by-label-zulu
Event:       Does not exist
Description: unable to read filesystem '/srv/dev-disk-by-label-zulu' state

and

Code

The system monitoring needs your attention.

Host:        \openmediavault.local
Date:        Wed, 07 Apr 2021 08:52:35
Service:     mountpoint_srv_dev-disk-by-label-zulu
Event:       Status failed
Description: status failed (1) -- /srv/dev-disk-by-label-zulu is not a mountpoint

This triggered the monitoring system to: alert

Other info:

Code

root@openmediavault:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sda[4](S) sdc[2](S) sdb[1](S)
      5860147464 blocks super 1.2
       
unused devices: <none>

Code

root@openmediavault:~# blkid
/dev/sdb: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="5667c0d5-4cec-a644-36a3-e641ec176a46" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
/dev/sda: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="acaec5ef-d304-a129-3754-9605267dcbdf" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
/dev/sde1: UUID="173d1141-65e9-4ee1-ae31-b73d34f7b2cf" TYPE="ext4" PARTUUID="9d8e1096-01"
/dev/sde5: UUID="6947d5ca-f259-4fe7-be54-5b945620213c" TYPE="swap" PARTUUID="9d8e1096-05"
/dev/sdc: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="d49ed9a5-6400-f405-ea4d-0601f2e60642" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
/dev/sdf1: LABEL="Backup" UUID="83ed8d9d-e2f7-4e64-bfc8-8fe26f404112" TYPE="ext4" PARTUUID="4f80a638-fd4a-44e9-851a-3c9575507f12"

Code

root@openmediavault:~# fdisk -l | grep "Disk "
Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Hitachi HDS72302
Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: TOSHIBA HDWD120 
Disk /dev/sde: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: CT120BX500SSD1  
Disk identifier: 0x9d8e1096
Disk /dev/sdd: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WDC WD20EARS-00M
Disk /dev/sdc: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: SAMSUNG HD204UI 
Disk /dev/sdf: 5.5 TiB, 6001175125504 bytes, 11721045167 sectors
Disk model: Expansion Desk  
Disk identifier: B27D326A-00A1-47B0-B2D5-732A08CA3BEA

Alles anzeigen

Code

root@openmediavault:~# cat /etc/mdadm/mdadm.conf
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR omv@xx.co.uk
MAILFROM root

# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 spares=0 name=openmediavault.local:zulu UUID=e722afd9:58035460:ce00e630:17883000

Alles anzeigen

Code

root@openmediavault:~# mdadm --detail --scan --verbose
INACTIVE-ARRAY /dev/md0 num-devices=3 metadata=1.2 name=openmediavault.local:zulu UUID=e722afd9:58035460:ce00e630:17883000
   devices=/dev/sda,/dev/sdb,/dev/sdc

Code

root@openmediavault:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@openmediavault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcd]
mdadm: looking for devices for /dev/md0
mdadm: Cannot read superblock on /dev/sdd
mdadm: no RAID superblock on /dev/sdd
mdadm: /dev/sdd has no superblock - assembly aborted

The section above was how I rebuilt things last time but it's not responding in this case. One disk seems to be struggling as it has a red marker next to it in the SMART section of OMV but surely the RAID should still function, albeit in a degraded state? Do I have several issues at once, not just the failing HD?

How can I get things back on a solid footing?

S

smartin · 7. April 2021

Just trying to make sense of all the Terminal stuff...

The disk which is failing the SMART test is /dev/sde. The disk that mdadm complains about is /dev/sdd...

Code

 mdadm: /dev/sdd has no superblock - assembly aborted

geaves · 7. April 2021

Zitat von smartin

How can I get things back on a solid footing

Go back over the output you have posted then look at the --assemble command you used.

smartin · 7. April 2021

Zitat von geaves

Go back over the output you have posted then look at the --assemble command you used.

It's the one you taught me last time!

Should it be <

Code

root@openmediavault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[abcd]> ?

geaves · 7. April 2021

Zitat von smartin

It's the one you taught me last time

Look at cat /proc/mdstat mdadm --detail --scan and blkid

mdadm --detail /dev/md0 will also confirm the above

smartin · 8. April 2021

Zitat von geaves

Look at cat /proc/mdstat mdadm --detail --scan and blkid

mdadm --detail /dev/md0 will also confirm the above

I'm way out of my depth here... I get:

Code

root@openmediavault:~# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 3
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 3

              Name : openmediavault.local:zulu  (local to host openmediavault.local)
              UUID : e722afd9:58035460:ce00e630:17883000
            Events : 8551

    Number   Major   Minor   RaidDevice

       -       8        0        -        /dev/sda
       -       8       48        -        /dev/sdd
       -       8       16        -        /dev/sdb

Alles anzeigen

Should my command be...?

Code

root@openmediavault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[abd]

I don't understand where the 4th drive is... Ok... so the 4th drive has failed and only sda/d/b remain...?

S

geaves · 8. April 2021

Zitat von smartin

I don't understand where the 4th drive is... Ok... so the 4th drive has failed and only sda/d/b remain

this is beginning to make very little sense, going back to your first post;

cat /proc/mdstat = /dev/[abc]

blkid = /dev/[abc]

mdadm --detail --scan = /dev/[abc]

no where in the above does /dev/sdd show other than fdisk -l, that tells you the system 'sees' the drive but it doesn't know where or what it's doing, hence the no superblock error when trying to run mdadm --assemble.

#6 throws a totally new curve ball, it lists the raid as raid0, added to that it's listing the drives as /dev/sd[abd]

Have you rebooted at all after your first post?

smartin · 8. April 2021

Zitat von geaves

this is beginning to make very little sense, going back to your first post;

cat /proc/mdstat = /dev/[abc]
blkid = /dev/[abc]
mdadm --detail --scan = /dev/[abc]

no where in the above does /dev/sdd show other than fdisk -l, that tells you the system 'sees' the drive but it doesn't know where or what it's doing, hence the no superblock error when trying to run mdadm --assemble.

#6 throws a totally new curve ball, it lists the raid as raid0, added to that it's listing the drives as /dev/sd[abd]

Have you rebooted at all after your first post?

Alles anzeigen

Yes, I rebooted twice in the hope things might untangle themselves...

smartin · 8. April 2021

Thought I'd run the commands again as a tripple check:

Code

root@openmediavault:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sda[4](S) sdd[2](S) sdb[1](S)
      5860147464 blocks super 1.2
       
unused devices: <none>
root@openmediavault:~# blkid
/dev/sdb: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="5667c0d5-4cec-a644-36a3-e641ec176a46" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
/dev/sda: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="acaec5ef-d304-a129-3754-9605267dcbdf" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
/dev/sdd: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="d49ed9a5-6400-f405-ea4d-0601f2e60642" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
/dev/sdf1: UUID="173d1141-65e9-4ee1-ae31-b73d34f7b2cf" TYPE="ext4" PARTUUID="9d8e1096-01"
/dev/sdf5: UUID="6947d5ca-f259-4fe7-be54-5b945620213c" TYPE="swap" PARTUUID="9d8e1096-05"
/dev/sdc1: LABEL="Backup" UUID="83ed8d9d-e2f7-4e64-bfc8-8fe26f404112" TYPE="ext4" PARTUUID="4f80a638-fd4a-44e9-851a-3c9575507f12"
root@openmediavault:~# mdadm --detail --scan
INACTIVE-ARRAY /dev/md0 metadata=1.2 name=openmediavault.local:zulu UUID=e722afd9:58035460:ce00e630:17883000
root@openmediavault:~#

Alles anzeigen

geaves · 8. April 2021

Zitat von smartin

Yes, I rebooted twice in the hope things might untangle themselves

If I could have a £1 every time someone said that that explains why the drive references have changed.

So confirm the array has stopped, and run the mdadm --assemble but with /dev/sd[abd] hopefully that will work and rebuild the array in a degraded state.

DO NOT REBOOT, DO NOT PASS GO

Then run mdadm --examine /dev/sdc only after the raid had been rebuilt

smartin · 8. April 2021

Zitat von geaves

If I could have a £1 every time someone said that that explains why the drive references have changed.

So confirm the array has stopped, and run the mdadm --assemble but with /dev/sd[abd] hopefully that will work and rebuild the array in a degraded state.

DO NOT REBOOT, DO NOT PASS GO

Then run mdadm --examine /dev/sdc only after the raid had been rebuilt

Alles anzeigen

Sorry! Desperation!

With

Code

root@openmediavault:~# mdadm --stop /dev/md0

?

It's ok to do all this from outside the OMV GUI? I read somewhere to do as much as possible from the GUI so as not to add extra entries to the DB...?

Will the Terminal give progress/tell me when the array is rebuilt?

Is there any mileage in replacing the dodgy disk and *then* rebuilding things? I have a new one...

Thanks for your time as always!

s

geaves · 8. April 2021

Zitat von smartin

It's ok to do all this from outside the OMV GUI? I read somewhere to do as much as possible from the GUI so as not to add extra entries to the DB

You have to run --assemble from the cli, you've already blown your brownie points by rebooting

Zitat von smartin

Will the Terminal give progress/tell me when the array is rebuilt

It can do if you run cat /proc/mdstat once the rebuild has started, but it should display in the GUI anyway

Zitat von smartin

Is there any mileage in replacing the dodgy disk and *then* rebuilding things? I have a new one

Where's the 'dodgy' disk, you don't know yet if the array will rebuild, let's do one step at a time, but to answer your question you can't slap in a new disk because the array is currently inactive.

smartin · 8. April 2021

Zitat von geaves

You have to run --assemble from the cli, you've already blown your brownie points by rebooting
It can do if you run cat /proc/mdstat once the rebuild has started, but it should display in the GUI anyway
Where's the 'dodgy' disk, you don't know yet if the array will rebuild, let's do one step at a time, but to answer your question you can't slap in a new disk because the array is currently inactive.

Thank you Sir!

I'm going to have a go at this over the weekend. I'm having to drag myself away for work overnight.

Really appreciate the assistance! Keep fingers crossed for me!

S

smartin · 11. April 2021

Zitat von geaves

You have to run --assemble from the cli, you've already blown your brownie points by rebooting
It can do if you run cat /proc/mdstat once the rebuild has started, but it should display in the GUI anyway
Where's the 'dodgy' disk, you don't know yet if the array will rebuild, let's do one step at a time, but to answer your question you can't slap in a new disk because the array is currently inactive.

My hands are a bit shaky and clammy but things seem to be rebuilding happily now.

Thanks again

S

Jetzt mitmachen!