RAID 5 failure. All drives missing

massimi72 · 15. Februar 2020

Hi guys, I'm on OMV 4 and suddenly my raid set disappeared.
I'm not an expert of Linux, just an happy user of OMV since few years.

There is a disk with a red flag on SMART, from Disk section I can see all the drives but the raid is not recognised. In the file system section I can see the name of my raid set but no mounted (and I cannot mount it).
Can you please advise where to start to try and recover the data?

Thanks, M

cabrio_leo · 15. Februar 2020

Please start by answering this questions: Degraded or missing raid array questions

massimi72 · 15. Februar 2020

here we go:

1. root@omv2:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdb[3](S) sdc[0](S) sdd[1](S) sde[2](S)
7813534048 blocks super 1.2

unused devices: <none>

2. root@omv2:~# blkid
/dev/sda1: UUID="E179-5A61" TYPE="vfat" PARTUUID="6ffe10e0-0162-47d6-aebb-675329a11e86"
/dev/sda2: UUID="6b814003-02eb-4ff4-8908-144e2ce1cf0e" TYPE="ext4" PARTUUID="54c1a535-e448-412c-8d36-79fc9b9decb0"
/dev/sda3: UUID="30a77b0b-9afb-45ce-9499-0240f1fd7b0d" TYPE="swap" PARTUUID="9cca0d43-23f9-4a77-9868-3f1ed55590b7"
/dev/sdc: UUID="e9d64950-32ce-bf3e-b3bd-b9a116e1feeb" UUID_SUB="692cc133-bbf1-fdf5-6f64-b94b90d57589" LABEL="omv2:raid5" TYPE="linux_raid_member"
/dev/sdb: UUID="e9d64950-32ce-bf3e-b3bd-b9a116e1feeb" UUID_SUB="2824a33f-68bd-50d5-9551-b1136ad0b57b" LABEL="omv2:raid5" TYPE="linux_raid_member"
/dev/sdd: UUID="e9d64950-32ce-bf3e-b3bd-b9a116e1feeb" UUID_SUB="8e294b42-ceca-0e72-b916-4b0178f5badf" LABEL="omv2:raid5" TYPE="linux_raid_member"
/dev/sde: UUID="e9d64950-32ce-bf3e-b3bd-b9a116e1feeb" UUID_SUB="476e9ffa-8561-14c1-3709-b708bf9c1129" LABEL="omv2:raid5" TYPE="linux_raid_member"

3. root@omv2:~# fdisk -l | grep "Disk "
Disk /dev/sda: 111.8 GiB, 120040980480 bytes, 234455040 sectors
Disk identifier: 8F69B466-3E4D-4DEA-A5A5-00FB29BEA2DD
Disk /dev/sdc: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk /dev/sdd: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk /dev/sde: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors

4. root@omv2:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays
ARRAY /dev/md127 metadata=1.2 name=omv2:raid5 UUID=e9d64950:32cebf3e:b3bdb9a1:16e1feeb

5. root@omv2:~# mdadm --detail --scan --verbose
INACTIVE-ARRAY /dev/md127 num-devices=4 metadata=1.2 name=omv2:raid5 UUID=e9d64950:32cebf3e:b3bdb9a1:16e1feeb
devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde

OMV is on a 120 SSD
RAID 5 set is built on 4 Seagate 2Tb each

The system was stuck and I didn't know how to fix it so I reinstalled the whole system.
After that the RAID was still working.
Today I attached a USB drive to backup the data via rsync and during the copy the system was not working anymore (no files copied, the NAS was beeping every second).
Switched off with the button and when on again the RAID was not recognised anymore.

cabrio_leo · 15. Februar 2020

Zitat von massimi72

the NAS was beeping every second).

Do you know what can cause the NAS to beep? Is there a hardware problem beside the RAID maybe eg. the power supply? Maybe this was the reason that the NAS was stucked at first.

A RAID do not like it to be unpowered unexpectedly. But your RAID is INACTIVE, not DEGRADED. That´s good.

Maybe @geaves may chime in?

In the meantime you may have a look at this article: How to fix linux mdadm inactive array

geaves · 15. Februar 2020

From the command line;

mdadm --stop /dev/md127

mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcde]

massimi72 · 15. Februar 2020

I tried with:
root@omv2:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcde]
mdadm: looking for devices for /dev/md127
mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sde
mdadm: /dev/sde has no superblock - assembly aborted
root@omv2:~#

I start getting worried for my data...
What shall I do?
Unfortunately I'm not an expert of linux and shell...

geaves · 15. Februar 2020

Assemble it with just 3 drives, should come up clean/degraded;

mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcd]

massimi72 · 16. Februar 2020

It does, but when I start copying the dat out the system gen stuck and the raid disappear

massimi72 · 16. Februar 2020

I then switched off and on again. I needed to disconnect the faulty disk otherwise the system didn't start. At the second trial this is the result:

mdadm: stopped /dev/md127
root@omv2:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcd]
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdd is identified as a member of /dev/md127, slot 1.
mdadm: forcing event count in /dev/sdc(0) from 5494 upto 5851
mdadm: added /dev/sdd to /dev/md127 as 1
mdadm: no uptodate device for slot 2 of /dev/md127
mdadm: added /dev/sdb to /dev/md127 as 3
mdadm: added /dev/sdc to /dev/md127 as 0
mdadm: /dev/md127 assembled from 3 drives - not enough to start the array.
root@omv2:~#

root@omv2:~# mdadm --examine /dev/sd*3
mdadm: No md superblock detected on /dev/sda3.
root@omv2:~#

geaves · 16. Februar 2020

Zitat von massimi72

It does, but when I start copying the dat out the system gen stuck and the raid disappear

Did I say anything about copying the data off, one of the four drives is missing superblock, if that's the failing then it needs replacing and there is a procedure for doing that.

Zitat von massimi72

I then switched off and on again. I needed to disconnect the faulty disk otherwise the system didn't start. At the second trial this is the

Then you have probably lost your data!!!!

Output of mdadm --detail /dev/md127

massimi72 · 16. Februar 2020

Thank you very much for your help.

Disk has been re-connected again and after a while the system actually started.
Now I mounted the 3 disks as you suggested.

The output of the last is:
root@omv2:~# mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Tue Feb 4 09:31:04 2020
Raid Level : raid5
Array Size : 5860150272 (5588.67 GiB 6000.79 GB)
Used Dev Size : 1953383424 (1862.89 GiB 2000.26 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sun Feb 16 06:47:34 2020
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : omv2:raid5 (local to host omv2)
UUID : e9d64950:32cebf3e:b3bdb9a1:16e1feeb
Events : 5853

Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 48 1 active sync /dev/sdd
- 0 0 2 removed
3 8 16 3 active sync /dev/sdb
root@omv2:~#

geaves · 16. Februar 2020

So the drive (that's the physical drive) you have removed for the system is /dev/sde, to confirm lsblk if so do you have a drive to replace it.

massimi72 · 16. Februar 2020

Hi, now the 4 drives are all physically connected (only 3 mounted as raid) but one of them has the red flag on SMART. I have one similar that I can use to replace it.
I’m not sure if relevant but the one faulty is connected to a PCI sata controller, the other drives directly connected into the motherboard. If I need to add one more drive it needs to go into the PCI controller though

geaves · 16. Februar 2020

Zitat von massimi72

Hi, now the 4 drives are all physically connected (only 3 mounted as raid) but one of them has the red flag on SMART

If you going to give me information it has to detailed, contain references to the specific drives, in post 6 the raid could not be assembled with 4 drives due to a missing superblock on /dev/sde, this is confirmed by the output from post 11.

So what's the output of lsblk and the output of mdadm --detail /dev/sde

massimi72 · 16. Februar 2020

here we go:

1. root@omv2:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 111.8G 0 disk
|-sda1 8:1 0 512M 0 part /boot/efi
|-sda2 8:2 0 109.6G 0 part /
`-sda3 8:3 0 1.7G 0 part [SWAP]
sdb 8:16 0 1.8T 0 disk
`-md127 9:127 0 5.5T 0 raid5
sdc 8:32 0 1.8T 0 disk
`-md127 9:127 0 5.5T 0 raid5
sdd 8:48 0 1.8T 0 disk
`-md127 9:127 0 5.5T 0 raid5
sde 8:64 0 1.8T 0 disk

2. root@omv2:~# mdadm --detail /dev/sde
mdadm: /dev/sde does not appear to be an md device

geaves · 16. Februar 2020

Zitat von massimi72

mdadm: /dev/sde does not appear to be an md device

That's interesting, it seems because you have removed the drive and rebooted it's removed itself from the array

OK Storage -> Disks select /dev/sde, click wipe and from the dialog click short, this will wipe the drive, once completed, Raid Management -> Click on the Raid and select Recover from the menu, a dialog will come up which should show /dev/sde, select it and click ok, the array should now sync/rebuild with the added disk.

If the above works come back when it's finished, do not use/access the array whilst the sync is running

massimi72 · 16. Februar 2020

Great, it is rebuilding

geaves · 16. Februar 2020

Zitat von massimi72

Great, it is rebuilding

Good, now when it's finished I'll need the drive reference of the drive that has the SMART error (red dot) so /dev/sd? with the ? being the drive letter reference.

massimi72 · 16. Februar 2020

Finished.

The SMART is now Green for all disks

The raid (in raid management) says that the raid is Clean and I can't see any worning but when I try to mount the RAID5 in filesystem there si an error "Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; mount -v --source '/dev/disk/by-label/RAID5' 2>&1' with exit code '32': mount: mount /dev/md127 on /srv/dev-disk-by-label-RAID5 failed: Structure needs cleaning".

Thanks, M

geaves · 16. Februar 2020

You're full of surprises try fsck /dev/md127

Jetzt mitmachen!