UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

LukeR1886 · 29. Januar 2024

UPDATE: THIS CONCERN WAS RESOLVED.

THE LAST FEW POSTS OF THIS THREAD HAVE THE REPAIR DOCUMENTED IN DETAIL.

There was some sort of cabling issue and I lost all of my drives connectivity, and assume the array went offline when it lost sync. The RAID has disappeared, and I've tried a few things to get it back online. I may have started more trouble by attempting a mdadm --assemble --force --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde command. Someone help me !

Krisbee · 29. Januar 2024

How are your drives connected, SATA or USB? If a drive dropped out due to connectivity issue, a simple reboot maybe all you that you needed.

What output messages did you see when using that mdadm assemble command?

Please post the output of these commands: cat /proc/mdstat and blkid

LukeR1886 · 29. Januar 2024

Thank you for the response. SAS connected, through a PERC310 flashed to IT mode. Please be patient, as I have just replaced the SAS controller, cables, and am currently booting into a new OMV installation to make sure there is nothing connected to whatever issues have started.

LukeR1886 · 29. Januar 2024

So I'm not sure if it was the cabling or the SAS controller, but now the array appears in a degraded state, but with only the drives it deemed OK with the failing cable/ controller.

Here is the data:

root@openmediavault:~# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

md127 : active (auto-read-only) raid5 sde[3] sdb[1] sdd[2]

17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]

bitmap: 0/44 pages [0KB], 65536KB chunk

unused devices: <none>

root@openmediavault:~# blkid

/dev/sda1: UUID="43b1e19a-3a73-477f-b758-db5b833bf8af" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="71f4a275-01"

/dev/sda5: UUID="02abfba3-4cd8-40a1-b479-75598af04eb7" TYPE="swap" PARTUUID="71f4a275-05"

/dev/sdc: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="7df74adc-8d0d-5800-3a4f-2f243b67fe1a" LABEL="Bern:Storage" TYPE="linux_raid_member"

/dev/sdb: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="697882ba-3d2c-24a1-242a-33f8c80d2f48" LABEL="Bern:Storage" TYPE="linux_raid_member"

/dev/sdd: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="f4f3838b-f381-01d0-ef9c-0eb96ee49e1d" LABEL="Bern:Storage" TYPE="linux_raid_member"

/dev/sde: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="392758f0-7e5b-d0b6-fe65-ab743d6268b8" LABEL="Bern:Storage" TYPE="linux_raid_member"

/dev/md127: LABEL="Storage" UUID="fc4c074a-e2d7-46ee-82bd-2983d057c941" BLOCK_SIZE="4096" TYPE="ext4"

/dev/sr0: BLOCK_SIZE="2048" UUID="2022-04-26-18-09-57-00" LABEL="openmediavault 20220426-20:09" TYPE="iso9660" PTUUID="a3602feb" PTTYPE="dos"

LukeR1886 · 29. Januar 2024

Now when I try a

mdadm --stop /dev/md127

mdadm --assemble --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde

I get a :

root@openmediavault:~# mdadm --assemble --force --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde

mdadm: looking for devices for /dev/md127

mdadm: /dev/sdb is identified as a member of /dev/md127, slot 1.

mdadm: /dev/sdc is identified as a member of /dev/md127, slot 0.

mdadm: /dev/sdd is identified as a member of /dev/md127, slot 2.

mdadm: /dev/sde is identified as a member of /dev/md127, slot 3.

mdadm: added /dev/sdc to /dev/md127 as 0 (possibly out of date)

mdadm: added /dev/sdd to /dev/md127 as 2

mdadm: added /dev/sde to /dev/md127 as 3

mdadm: added /dev/sdb to /dev/md127 as 1

mdadm: /dev/md127 has been started with 3 drives (out of 4).

Krisbee · 29. Januar 2024

As /dev/md127 is currently active with 3 out 4 drives, try failing /dev/sdc, then remove it from the array and add it back and see if it re-syncs OK.

mdadm /dev/md127 --fail /dev/sdc

mdadm /dev/md127 --remove /dev/sdc

mdadm /dev/md127 --add /dev/sdc

If it still complains it's /dev/sdc out of date, then perhaps you're going to have to wipe that drive before it can be added back to the array. But in that case you could wait to see if geaves comes along with an answer.

LukeR1886 · 29. Januar 2024

Zitat von Krisbee

As /dev/md127 is currently active with 3 out 4 drives, try failing /dev/sdc, then remove it from the array and add it back and see if it re-syncs OK.

mdadm /dev/md127 --fail /dev/sdc
mdadm /dev/md127 --remove /dev/sdc
mdadm /dev/md127 --add /dev/sdc

If it still complains it's /dev/sdc out of date, then perhaps you're going to have to wipe that drive before it can be added back to the array.

Alles anzeigen

OK, and is this to be run with the array stopped?
e.g. run this first? mdadm --stop /dev/md127

Krisbee · 29. Januar 2024

AFAIK no, you would not stop the array to remove a drive using the Webui. But let me re-state those commands.

mdadm /dev/md127 --fail /dev/sdc --remove /dev/sdc

mdadm --add /dev/md127 /dev/sdc

follow with a

mdam --detail /dev/md127

LukeR1886 · 29. Januar 2024

Oh, shoot....

Here's the errors on the terminal screen of the VM container when adding the /dev/sdc.

And also the SMART status in the WebUI of /dev/sdc directly after (was operating correctly before adding)

LukeR1886 · 29. Januar 2024

And now when I --fail /dev/sdc and --remove /dev/sdc it all goes back where we were before...

geaves · 29. Januar 2024

Post the output of cat /proc/mdstat post in a code box please, this symbol </> on the forum bar makes it easier to read

The output from #4 of the above shows the array as (auto-read-only), also to re add /dev/sdc with the 'Possibly out of date' error the drive will have to be securely wiped, this can usually be run to 25% then stopped, then try re adding the drive to the array. Do not add the drive until it has finished rebuilding and the (auto-read-only) is corrected.

LukeR1886 · 29. Januar 2024

Zitat von geaves

Post the output of cat /proc/mdstat post in a code box please, this symbol </> on the forum bar makes it easier to read

The output from #4 of the above shows the array as (auto-read-only), also to re add /dev/sdc with the 'Possibly out of date' error the drive will have to be securely wiped, this can usually be run to 25% then stopped, then try re adding the drive to the array. Do not add the drive until it has finished rebuilding and the (auto-read-only) is corrected.

geaves So, I've already attempted to add the drive back, got errors, and performed a --fail /dev/sdc and --remove /dev/sdc to remove it. now it's back to the previous, non functional state. I hope nothing was damaged.

How do I get the machine to rebuild the array with only 3 drives... to get the array out of auto-read-only?

Here's the cat /proc/mdstat

Code

root@openmediavault:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active (auto-read-only) raid5 sdb[1] sde[3] sdd[2]
      17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 0/44 pages [0KB], 65536KB chunk

unused devices: <none>

LukeR1886 · 30. Januar 2024

UPDATE:

I haven't heard anything back yet, but I've made some progress and figured out some of this headache on my own. I could still REALLY use some help.

I learned that when the array says active (auto-read-only) that the host OS will eventually work itself out of read-only mode without any further inputs.
So now:

I've allowed the array to finish syncing and the active (auto-read-only) property has disappeared.
I've run the "wipe drive" tool in secure mode on the /dev/sdc (the drive that was out of sync) until it reached 25%
The commands mdadm /dev/md127 --fail /dev/sdc and mdadm /dev/md127 --remove /dev/sdc were already reviously run, but I ran them again to ensure the drive was removed.
When I run the command mdadm --add /dev/md127 /dev/sdc I cannot add the drive with the following error:

Code

root@openmediavault:~# mdadm --add /dev/md127 /dev/sdc
mdadm: add new device failed for /dev/sdc as 4: Invalid argument

The mdadm --detail /dev/md127 command is as follows, and displays that the Superblock data knows there are 4 devices, but only 3 are active. I cannot figure out the command to remove the "removed" device from the Superblock data. So thus I'm stuck unable to add a replacement device with the error message mdadm: add new device failed for /dev/sdc as 4: Invalid argument

Code

root@openmediavault:~# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Wed Mar 16 19:00:56 2022
        Raid Level : raid5
        Array Size : 17581171200 (16766.71 GiB 18003.12 GB)
     Used Dev Size : 5860390400 (5588.90 GiB 6001.04 GB)
      Raid Devices : 4
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Jan 29 21:00:27 2024
             State : clean, degraded
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : Bern:Storage
              UUID : 0d9284e2:af7d455e:721ec148:2f768fe4
            Events : 51099

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       16        1      active sync   /dev/sdb
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde

Alles anzeigen

If anyone can chime in on the process of removing the "removed" drive and adding a new one, I would greatly appreciate any help! Thanks in advance to anyone who can assist!

Krisbee geaves

geaves · 30. Januar 2024

Zitat von LukeR1886

I've allowed the array to finish syncing and the active (auto-read-only) property has disappeared.

Well that's a first

Zitat von LukeR1886

If anyone can chime in on the process of removing the "removed" drive and adding a new one

You can't, all mdadm is telling you is that it has removed a drive from the array, the --detail switch is giving you information

Zitat von LukeR1886

mdadm: add new device failed for /dev/sdc as 4: Invalid argument

if 25% secure wipe did not work I would suggest running a secure wipe on the whole drive then try again

Adding a drive can be done from the GUI, Raid Management, select the array and click recover on the menu and follow the on screen instructions

Krisbee · 30. Januar 2024

It's wouldn't hurt to do a long SMART test on that disk to check for bad sectors.

LukeR1886 · 30. Januar 2024

Krisbee and geaves thank you both, kindly, for the input you've provided thus far.

I'm preparing to rebuild the array, but have a few questions as I am not new to tech, but this is my first time repairing a completely missing drive.

Can I mount the filesystem before trying a RAID rebuild, to recover some of the stuff that I know isn't backed up anywhere else? This is just a precaution in case the array completely dies on recovery. It's probably or 10% or 15% of the entire array's volume that I'd be focused on retrieving. When I tried to mount it in the WebUI, I got an error that I'll post at the bottom...

Here is my preparation to start the recovery, anything else I should know/ do ?

I have the original /dev/sdc (a Seagate ST6000NM0014) reaching the end of its secure wipe routine.
I installed a "spare" drive as /dev/sdf (a Seagate ST6000NM0034) currently being secure-wiped too, just in case the original drive won't work.
Both are pulled from the actual chassis and are arranged in a way to to have extra fans placed on them, dropped the temperature from 38c to 30c, as I'm sure the rebuilding of the array will warm things up a little and... well, the drives have been healthy for a long time, but they're high-mileage units.

"clean, degraded" array mount error: Looks like the superblock is unreadable. Is that normal or repairable?

Code

Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': debian:
----------
          ID: create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: file.accumulated
      Result: True
     Comment: Accumulator create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9 for file /etc/fstab was charged by text
     Started: 04:54:06.075138
    Duration: 0.624 ms
     Changes:
----------
          ID: mount_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: mount.mounted
        Name: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941
      Result: False
     Comment: mount: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941: can't read superblock on /dev/md127.
     Started: 04:54:06.076207
    Duration: 132.162 ms
     Changes:
----------
          ID: append_fstab_entries
    Function: file.blockreplace
        Name: /etc/fstab
      Result: True
     Comment: No changes needed to be made
     Started: 04:54:06.208937
    Duration: 3.191 ms
     Changes:

Summary for debian
------------
Succeeded: 2
Failed:    1
------------
Total states run:     3
Total run time: 135.977 ms

OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': debian:
----------
          ID: create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: file.accumulated
      Result: True
     Comment: Accumulator create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9 for file /etc/fstab was charged by text
     Started: 04:54:06.075138
    Duration: 0.624 ms
     Changes:
----------
          ID: mount_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: mount.mounted
        Name: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941
      Result: False
     Comment: mount: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941: can't read superblock on /dev/md127.
     Started: 04:54:06.076207
    Duration: 132.162 ms
     Changes:
----------
          ID: append_fstab_entries
    Function: file.blockreplace
        Name: /etc/fstab
      Result: True
     Comment: No changes needed to be made
     Started: 04:54:06.208937
    Duration: 3.191 ms
     Changes:

Summary for debian
------------
Succeeded: 2
Failed:    1
------------
Total states run:     3
Total run time: 135.977 ms in /usr/share/php/openmediavault/system/process.inc:197
Stack trace:
#0 /usr/share/php/openmediavault/engine/module/serviceabstract.inc(62): OMV\System\Process->execute()
#1 /usr/share/openmediavault/engined/rpc/config.inc(170): OMV\Engine\Module\ServiceAbstract->deploy()
#2 [internal function]: Engined\Rpc\Config->applyChanges(Array, Array)
#3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#4 /usr/share/php/openmediavault/rpc/serviceabstract.inc(149): OMV\Rpc\ServiceAbstract->callMethod('applyChanges', Array, Array)
#5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(588): OMV\Rpc\ServiceAbstract->OMV\Rpc\{closure}('/tmp/bgstatusqV...', '/tmp/bgoutputlO...')
#6 /usr/share/php/openmediavault/rpc/serviceabstract.inc(159): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure))
#7 /usr/share/openmediavault/engined/rpc/config.inc(192): OMV\Rpc\ServiceAbstract->callMethodBg('applyChanges', Array, Array)
#8 [internal function]: Engined\Rpc\Config->applyChangesBg(Array, Array)
#9 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#10 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('applyChangesBg', Array, Array)
#11 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('Config', 'applyChangesBg', Array, Array, 1)
#12 {main}

Alles anzeigen

LukeR1886 · 31. Januar 2024

UPDATE 1/30
The /dev/sdc (original) and the /dev/sdf (spare I have from different machine) both finished "Secure Wipe" routines. When I try to add /dev/sdc to the array as a replacement using the WebUI, it returns this error message:

Anyone have any further ideas? I found a thread from another forum where someone battled a very similar issue, but using only mdadm, as whatever system he was using was NOT OMV. I'll add a screenshot of his solution and the URL to the forum below the code box. Is his solution safe for me at all?

Code

OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; mdadm --manage '/dev/md127' --add /dev/sdc 2>&1' with exit code '1': mdadm: add new device failed for /dev/sdc as 4: Invalid argument in /usr/share/php/openmediavault/system/process.inc:197
Stack trace:
#0 /usr/share/openmediavault/engined/rpc/raidmgmt.inc(419): OMV\System\Process->execute()
#1 [internal function]: Engined\Rpc\RaidMgmt->add(Array, Array)
#2 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#3 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('add', Array, Array)
#4 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('RaidMgmt', 'add', Array, Array, 1)
#5 {main}

The solution to another RAID5 disaster (like this one), and the URL

Basically it looks like he carefully and manually rebuilt the array and superblock data with a few (precise) commands.

https://www.linuxquestions.org/questions/linux-server-73/mdadm-error-replacing-a-failed-disk-909577/

Krisbee · 31. Januar 2024

LukeR1886 It's not a disaster, yet. But please DO NOT use any mdadm create command as per the link you posted at this stage.

Wait untill geaves comes along and confirms what I think. The failure to mount the array device means you need to do a filesystem check on the array device which with luck will reconstruct the necessary superblock info. Your degraded array should then be mountable and you will see what data can be retrieved.

votdev · 31. Januar 2024

Zitat

Code

mdadm: add new device failed for /dev/sdc as 4: Invalid argument

I think the RAID is in degraded mode and therefor it is not possible to add another (fourth) disk. You should post the output of

Code

# cat /proc/mdstat
# mdadm --detail /dev/md127

LukeR1886 · 31. Januar 2024

Krisbee Thank you for the reassurance. We'll see how this goes. Do you think it would be wise to run fsck at this time, so it can run while I'm at work? Or shall we just wait for further approval before doing anything of the sort?

votdev I don't think the cat /proc/mdstat or the mdadm --detail /dev/md127 have changed much (if any) using a few reassemble commands after the initial post, but here goes a new run of each command this morning:

Code

root@openmediavault:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid5 sdb[1] sde[3] sdd[2]
      17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 1/44 pages [4KB], 65536KB chunk

unused devices: <none>

Code

root@openmediavault:~# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Wed Mar 16 19:00:56 2022
        Raid Level : raid5
        Array Size : 17581171200 (16766.71 GiB 18003.12 GB)
     Used Dev Size : 5860390400 (5588.90 GiB 6001.04 GB)
      Raid Devices : 4
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jan 30 17:10:11 2024
             State : clean, degraded
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : Bern:Storage
              UUID : 0d9284e2:af7d455e:721ec148:2f768fe4
            Events : 51116

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       16        1      active sync   /dev/sdb
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde

Alles anzeigen

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

macom 29. Januar 2024

LukeR1886 29. Januar 2024

LukeR1886 29. Januar 2024

Jetzt mitmachen!