Raid disappears after recovery of a Raid 1 on a new hdd when trying to replace also second (old) hdd

reschad · 3. Februar 2023

I had a hdd failure in my RAID 1 consisting out of /dev/sdb and /dev/sdc and the respective RAID array named "Datenspiegel1" in OMV went into state "clean, degraded", but the RAID volume filesystem /dev/md127/ was online and working with only /dev/sdb hdd online.

I replaced the defective /dev/sdc with new hdd, same type, same size and recovered the raid in the OVM RAID management.

After full recovery the RAID 1 array went back into state clean, so everything seemed to be good again, filesystem /dev/md127/ was online and synching.

I now also wanted to replace the older /dev/sdb hdd as a precaution but as soon as I shut down the system, also replacing the /dev/sdb/ hdd with an empty hdd same type, same size,

after rebooting, althoug disks /dev/sdc/ and /dev/sdb/ are recognized and online, the RAID 1 entry in OMV RAId management completely disappears and filesystem /dev/md127/ states "missing".

How can I fix this so I can recover the RAID 1 array again with the second new hdd?

Find the RAID array details below:

Version : 1.2

Creation Time : Fri Oct 14 16:01:40 2016

Raid Level : raid1

Array Size : 3906887488 (3725.90 GiB 4000.65 GB)

Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB)

Raid Devices : 2

Total Devices : 2

Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri Feb 3 15:57:43 2023

State : clean

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

Consistency Policy : bitmap

Name : nas1:Datenspiegel1 (local to host nas1)

UUID : fb9463db:f41487d2:1cc69691:4f4b3deb

Events : 15362

Number Major Minor RaidDevice State

0 8 16 0 active sync /dev/sdb

2 8 32 1 active sync /dev/sdc

I do not know if this is just a co-incident, but after the first recovery after the initial hdd failure I got a strange message "/dev/md/Datenspiegel1" SparesMissing event via mail notification once per day (I assume via cron job).

I checked /etc/mdadm/mdadm.conf and found a "spares=1" entry for the respective array but the raid array in the OMV frontend itself stated spares=0

According to a forum thread I found here, I corrected the /etc/mdadm/mdadm.conf and now the message does not appear again, also after several reboots.

geaves · 3. Februar 2023

Zitat von reschad

the RAID 1 entry in OMV RAId management completely disappears and filesystem /dev/md127/ states "missing".

That is because you physically removed the drive, mdadm is not 'hot swap' you have to fail and remove a drive for mdadm to know about it.

Replacing the first drive was fine because mdadm knew there was a failed drive as the array was in a clean/degraded state.

Post the output of the following cat /proc/mdstat and blkid please use a code box for the results -> this </> symbol on thread bar, makes it easier to read.

reschad · 4. Februar 2023

Thanks geaves for the quick reply.

Of course I was not removing/exchanging any physical hdd while the system was up and running.

My hope was, if I would simply remove/exchange any hdd after shutdown from a clean and synching array, mdadm would recognise this during boot-up and then degrade the array so I could recover it (again) to the new/unused hdd. If this does not work that way, I am more than excitied to learn, how I could "artificially fail a drive" for mdadm to recognize this and degrade the array so I can exchange the second 7 years old hdd of that RAID 1 array as a precaution and do not have to wait until it actually fails.

Here's the requested outputs.
proc/mdstat:

Code

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md127 : active raid1 sdb[0] sdc[2]
      3906887488 blocks super 1.2 [2/2] [UU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

unused devices: <none>

blkid:

Code

/dev/sdb: UUID="fb9463db-f414-87d2-1cc6-96914f4b3deb" UUID_SUB="f99fd516-bc52-651c-885e-a76f4303993a" LABEL="nas1:Datenspiegel1" TYPE="linux_raid_member"
/dev/sdd1: LABEL="single1" UUID="34d60af0-2fd0-438f-a650-87656ba54101" TYPE="ext4" PARTUUID="eedb97e2-e533-43b7-8913-95a257985537"
/dev/sdc: UUID="fb9463db-f414-87d2-1cc6-96914f4b3deb" UUID_SUB="2e073141-ae92-d0bc-f390-f762b91fd5cc" LABEL="nas1:Datenspiegel1" TYPE="linux_raid_member"
/dev/sda1: UUID="f692f97f-b000-4a33-8689-08d8355d13c5" TYPE="ext4" PARTUUID="05df71b4-01"
/dev/sda5: UUID="cc2db23f-2663-429c-8f0f-02d3d864bf4f" TYPE="swap" PARTUUID="05df71b4-05"
/dev/md127: LABEL="raid1" UUID="6fa802ae-c767-40e5-a176-b2b7f310887b" TYPE="ext4"

/dev/sdb/ is the drive I actually want to replace as a precaution

/dev/sdc/ is the drive I had replaced against the failing and recovered the arry to it.

geaves · 4. Februar 2023

Zitat von reschad

My hope was, if I would simply remove/exchange any hdd after shutdown from a clean and synching array, mdadm would recognise this during boot-up and then degrade the array so I could recover it (again) to the new/unused hdd

mdadm is not that intelligent, if you shutdown remove a drive and then install a new one mdadm would mark the array as inactive and the array would not show in raid management and the file system would display as missing.

As you're on V5 which is EOL you can replace a drive via the GUI it's a lot less painful and my instructions are from memory

Raid Management -> select the array -> on the menu there should be a delete/remove, click that and a box should show both drives, select the drive to remove and click OK, you may have to apply changes. The array should display in a clean/degraded state

Shutdown the server, remove the drive you have removed from the array and install the new one, restart the server

Storage -> Disks select the new drive installed, then click wipe on the menu, click short for a new drive, secure for a repurposed, wait for it to complete

Raid Management -> select the array, click recover from the menu, a box will appear showing the new drive, select it, click OK, the array will now rebuild

reschad · 4. Februar 2023

Zitat von geaves

As you're on V5 which is EOL you can replace a drive via the GUI it's a lot less painful

geaves yep, that was the missing gap.
Removing the disk from the array in the GUI did the job, degraded the array and after replacing the hdd it is recovering now.

Shame on me not thinking this way...

Classic #rtfmp issue: if you read the manuals completely (and understand it), then it would have been clear...

Regarding V5 being EOL, I am aware of it but first wantet to have this "issue" sorted out, before upgrading to 6.x.

Thanks for helping me out anyway.

Jetzt mitmachen!