raid5 - after changing disk clean/FAILED

Pfeifenraucher · 11. August 2022

Ups, I degraded a clean and working raid5 to renew a disk (sdb). Now I get an error: clean/FAILED, after installing a new disk. If I look in desciption I see this:

Code

  Version : 1.2
     Creation Time : Tue Sep  8 20:26:15 2015
        Raid Level : raid5
        Array Size : 5860148736 (5588.67 GiB 6000.79 GB)
     Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
      Raid Devices : 4
     Total Devices : 3
       Persistence : Superblock is persistent

       Update Time : Thu Aug 11 18:43:56 2022
             State : clean, FAILED
    Active Devices : 0
    Failed Devices : 3
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : resync

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed

       1       8        0        -      faulty   /dev/sda
       4       8       32        -      faulty   /dev/sdc
       5       8       48        -      faulty   /dev/sdd

Alles anzeigen

So what can I do? I reinstalled the old sdb, but cannot get md127 back to work.

I managed to change two disks today, but the third tries to make me unhappy.

Thnx allot for your help

cat /proc/mdstat

Code

root@omv:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid5 sdc[4](F) sdd[5](F) sda[1](F)
      5860148736 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/0] [____]

unused devices: <none>

blkid

Code

root@omv:~# blkid
/dev/sde1: UUID="e10f8611-e537-471c-a7a2-93b2774b2e2d" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="b84e6387-01"
/dev/sde5: UUID="0fc5a379-2025-4b9d-bb58-c18e876d19f5" TYPE="swap" PARTUUID="b84e6387-05"
/dev/sda: UUID="813b7562-8521-b311-f237-8373d31cba65" UUID_SUB="b9681117-d365-a4c2-17ca-57d0196c0cdc" LABEL="omv:Datengrab" TYPE="linux_raid_member"
/dev/md127: LABEL="Datengrab" UUID="428b3002-db0b-4935-a396-5dc9043e595d" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdd: UUID="813b7562-8521-b311-f237-8373d31cba65" UUID_SUB="70b96ef1-a538-1392-d319-b0c142dec417" LABEL="omv:Datengrab" TYPE="linux_raid_member"
/dev/sdc: UUID="813b7562-8521-b311-f237-8373d31cba65" UUID_SUB="757004c4-c94a-de1f-5a8b-59ce64206abb" LABEL="omv:Datengrab" TYPE="linux_raid_member"
/dev/sdb: UUID="813b7562-8521-b311-f237-8373d31cba65" UUID_SUB="cdc95198-a4f0-d1c9-6e2a-18127e7b5510" LABEL="omv:Datengrab" TYPE="linux_raid_member"

tbc

Pfeifenraucher · 11. August 2022

fdisk -l | grep "Disk "

Code

root@omv:~# fdisk -l | grep "Disk "
Disk /dev/sde: 476,94 GiB, 512110190592 bytes, 1000215216 sectors
Disk model: OCZ-VERTEX4
Disk identifier: 0xb84e6387
Disk /dev/sda: 1,82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: ST2000NM0011
Disk /dev/md127: 5,46 TiB, 6000792305664 bytes, 11720297472 sectors
Disk /dev/sdd: 3,64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: ST4000VN006-3CW1
Disk /dev/sdc: 3,64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: ST4000VN006-3CW1
Disk /dev/sdb: 1,82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: ST2000NM0011

Alles anzeigen

cat /etc/mdadm.conf

Code

root@omv:~# cat /etc/mdadm/mdadm.conf
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.


# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md/Datengrab metadata=1.2 name=omv:Datengrab UUID=813b7562:8521b311:f2378373:d31cba65
root@omv:~#

Alles anzeigen

mdadm --detail --scan --verbose

Code

root@omv:~# mdadm --detail --scan --verbose
ARRAY /dev/md/Datengrab level=raid5 num-devices=4 metadata=1.2
devices=/dev/sda,/dev/sdc,/dev/sdd

mdadm --assemble --force --verbose /dev/md127 /dev/sd[a1b1c1d1]

Code

root@omv:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[a1b1c1d1]
mdadm: looking for devices for /dev/md127
mdadm: /dev/sda is busy - skipping
mdadm: /dev/sdc is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: Found some drive for an array that is already active: /dev/md/Datengrab
mdadm: giving up.

I hope you can help me...

I found out, data is still there but only read-only

geaves · 11. August 2022

The whole of the above makes little or no sense at all, changing a failing drive using the GUI is very simple and straightforward

This -> I managed to change two disks today, but the third tries to make me unhappy. makes no sense, Raid5 can only suffer one drive failure remove/replace two at the same time the array is toast.

The output from cat /proc/mdstat shows three drives /dev/sd[acd] all with (F) so mdadm believes these drives have failed; but; blkid shows /dev/sd[abcd] as being linux raid members.............................confusing isn't it.

This -> I found out, data is still there but only read-only how?? if you have a working crystal ball them some us would like to know where you got it from.

Apologies for being sarcastic, but the whole thing is very hard to understand, so;

mdadm --stop /dev/md127

mdadm --assemble --force --verbose /dev/md127 /dev/sd[abcd]

post any errors, do not reboot!!

Pfeifenraucher · 11. August 2022

I know I've got the red nose today - even I really waited until raid is clean again before I took the next step aka disk - one good, two good, three - here we are...

1. Raid md127 remove sdd-old, save and wait until omv6 wants to save the change as well

2. remove sdd-old

3. plugin sdd-new, and wait a little bit -> drive is shown at disk-section

4. enable SMART-monitoring

5. swraid recover md127; add sdd-new; save and wait until omv6 wants to save the change as well

6. wait until recovery is finnished (took about 5h)

Go back to 1. and take sdc-old/sdc-new...

Next should be sdb-old/sdb-new and last for surprise sda-old/sda-new.

mdadm --stop /dev/md127

Code

root@omv:~# mdadm --stop /dev/md127
mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?

mdadm --assemble --force --verbose /dev/md127 /dev/sd[abcd]

Code

root@omv:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[abcd]
mdadm: looking for devices for /dev/md127
mdadm: /dev/sda is busy - skipping
mdadm: /dev/sdc is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: Found some drive for an array that is already active: /dev/md/Datengrab
mdadm: giving up.

Last I've seen already in post #2 - ok, a1b1c1d1 instead of abcd.

geaves · 11. August 2022

Zitat von Pfeifenraucher

1. Raid md127 remove sdd-old,

How, I am using OMV6 but I'm not using Raid and could spin up a VM but that would have to wait until tomorrow, but from memory in V5 there was an option on the Raid menu to delete/remove, that's the way a drive has to be, must be removed from an array. When done that way the array is in a clean/degraded state, so the shares and files are accessible.

Zitat von Pfeifenraucher

a1b1c1d1 instead of abcd

no a1b1 etc are drive partitions, OMV does not use partitions when creating an array it uses the full drive.

-------------------------------------------------------------------------------------------------------------------------------------------

This output -> mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group

suggests that something/someone still has access to the array that's why it cannot be stopped.

If you have the kernel plugin installed you could try using the SystemRescue option, this installs systemrescuecd and boots to that once, you can then run cli commands to assemble the array without omv running, exit and reboot will get back to omv. But, you need to check again options such as blkid.

Pfeifenraucher · 11. August 2022

Ok, i stopped nfs & docker and afterwards umount -f /dev/mdstat.

Code

root@omv:~# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
root@omv:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[abcd]
mdadm: looking for devices for /dev/md127
mdadm: Fail create md127 when using /sys/module/md_mod/parameters/new_array
mdadm: /dev/sda is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdd is identified as a member of /dev/md127, slot 3.
mdadm: added /dev/sda to /dev/md127 as 1
mdadm: added /dev/sdb to /dev/md127 as 2 (possibly out of date)
mdadm: added /dev/sdd to /dev/md127 as 3
mdadm: added /dev/sdc to /dev/md127 as 0
mdadm: /dev/md127 has been started with 3 drives (out of 4).

Alles anzeigen

Now md127 is online again - but not mounted yet. In the morning i'll try to switch and add the newdisk to md127, hopefully the rebuild will work.

Thnx for your efforts today, there will be an early bird tomorrow.

Jetzt mitmachen!