UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

  • UPDATE: THIS CONCERN WAS RESOLVED.

    THE LAST FEW POSTS OF THIS THREAD HAVE THE REPAIR DOCUMENTED IN DETAIL.

    There was some sort of cabling issue and I lost all of my drives connectivity, and assume the array went offline when it lost sync. The RAID has disappeared, and I've tried a few things to get it back online. I may have started more trouble by attempting a mdadm --assemble --force --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde command. Someone help me !

  • macom

    Approved the thread.
  • LukeR1886

    Added the Label OMV 6.x
  • LukeR1886

    Changed the title of the thread from “Another case of missung RAID5 and file syetem. Please help!” to “Another case of missing RAID5 and file syetem. Please help!”.
  • How are your drives connected, SATA or USB? If a drive dropped out due to connectivity issue, a simple reboot maybe all you that you needed.


    What output messages did you see when using that mdadm assemble command?


    Please post the output of these commands: cat /proc/mdstat  and blkid

  • Thank you for the response. SAS connected, through a PERC310 flashed to IT mode. Please be patient, as I have just replaced the SAS controller, cables, and am currently booting into a new OMV installation to make sure there is nothing connected to whatever issues have started.

  • So I'm not sure if it was the cabling or the SAS controller, but now the array appears in a degraded state, but with only the drives it deemed OK with the failing cable/ controller.

    Here is the data:

    root@openmediavault:~# cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active (auto-read-only) raid5 sde[3] sdb[1] sdd[2]

    17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]

    bitmap: 0/44 pages [0KB], 65536KB chunk

    unused devices: <none>



    root@openmediavault:~# blkid

    /dev/sda1: UUID="43b1e19a-3a73-477f-b758-db5b833bf8af" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="71f4a275-01"

    /dev/sda5: UUID="02abfba3-4cd8-40a1-b479-75598af04eb7" TYPE="swap" PARTUUID="71f4a275-05"

    /dev/sdc: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="7df74adc-8d0d-5800-3a4f-2f243b67fe1a" LABEL="Bern:Storage" TYPE="linux_raid_member"

    /dev/sdb: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="697882ba-3d2c-24a1-242a-33f8c80d2f48" LABEL="Bern:Storage" TYPE="linux_raid_member"

    /dev/sdd: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="f4f3838b-f381-01d0-ef9c-0eb96ee49e1d" LABEL="Bern:Storage" TYPE="linux_raid_member"

    /dev/sde: UUID="0d9284e2-af7d-455e-721e-c1482f768fe4" UUID_SUB="392758f0-7e5b-d0b6-fe65-ab743d6268b8" LABEL="Bern:Storage" TYPE="linux_raid_member"

    /dev/md127: LABEL="Storage" UUID="fc4c074a-e2d7-46ee-82bd-2983d057c941" BLOCK_SIZE="4096" TYPE="ext4"

    /dev/sr0: BLOCK_SIZE="2048" UUID="2022-04-26-18-09-57-00" LABEL="openmediavault 20220426-20:09" TYPE="iso9660" PTUUID="a3602feb" PTTYPE="dos"

  • Now when I try a

    mdadm --stop /dev/md127

    mdadm --assemble --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde


    I get a :


    root@openmediavault:~# mdadm --assemble --force --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde

    mdadm: looking for devices for /dev/md127

    mdadm: /dev/sdb is identified as a member of /dev/md127, slot 1.

    mdadm: /dev/sdc is identified as a member of /dev/md127, slot 0.

    mdadm: /dev/sdd is identified as a member of /dev/md127, slot 2.

    mdadm: /dev/sde is identified as a member of /dev/md127, slot 3.

    mdadm: added /dev/sdc to /dev/md127 as 0 (possibly out of date)

    mdadm: added /dev/sdd to /dev/md127 as 2

    mdadm: added /dev/sde to /dev/md127 as 3

    mdadm: added /dev/sdb to /dev/md127 as 1

    mdadm: /dev/md127 has been started with 3 drives (out of 4).

  • As /dev/md127 is currently active with 3 out 4 drives, try failing /dev/sdc, then remove it from the array and add it back and see if it re-syncs OK.


    mdadm /dev/md127 --fail /dev/sdc

    mdadm /dev/md127 --remove /dev/sdc

    mdadm /dev/md127 --add /dev/sdc


    If it still complains it's /dev/sdc out of date, then perhaps you're going to have to wipe that drive before it can be added back to the array. But in that case you could wait to see if geaves comes along with an answer.

  • OK, and is this to be run with the array stopped?
    e.g. run this first? mdadm --stop /dev/md127

  • AFAIK no, you would not stop the array to remove a drive using the Webui. But let me re-state those commands.


    mdadm /dev/md127 --fail /dev/sdc --remove /dev/sdc

    mdadm --add /dev/md127 /dev/sdc


    follow with a


    mdam --detail /dev/md127

    • Official Post

    Post the output of cat /proc/mdstat post in a code box please, this symbol </> on the forum bar makes it easier to read


    The output from #4 of the above shows the array as (auto-read-only), also to re add /dev/sdc with the 'Possibly out of date' error the drive will have to be securely wiped, this can usually be run to 25% then stopped, then try re adding the drive to the array. Do not add the drive until it has finished rebuilding and the (auto-read-only) is corrected.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 7x amd64 running on an HP N54L Microserver

  • Post the output of cat /proc/mdstat post in a code box please, this symbol </> on the forum bar makes it easier to read


    The output from #4 of the above shows the array as (auto-read-only), also to re add /dev/sdc with the 'Possibly out of date' error the drive will have to be securely wiped, this can usually be run to 25% then stopped, then try re adding the drive to the array. Do not add the drive until it has finished rebuilding and the (auto-read-only) is corrected.


    geaves So, I've already attempted to add the drive back, got errors, and performed a --fail /dev/sdc and --remove /dev/sdc to remove it. now it's back to the previous, non functional state. I hope nothing was damaged. :/

    How do I get the machine to rebuild the array with only 3 drives... to get the array out of auto-read-only?

    Here's the cat /proc/mdstat


    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active (auto-read-only) raid5 sdb[1] sde[3] sdd[2]
          17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
          bitmap: 0/44 pages [0KB], 65536KB chunk
    
    unused devices: <none>
  • UPDATE:

    I haven't heard anything back yet, but I've made some progress and figured out some of this headache on my own. I could still REALLY use some help.

    I learned that when the array says  active (auto-read-only)  that the host OS will eventually work itself out of read-only mode without any further inputs.
    So now:

    1. I've allowed the array to finish syncing and the  active (auto-read-only)  property has disappeared.
    2. I've run the "wipe drive" tool in secure mode on the /dev/sdc (the drive that was out of sync) until it reached 25%
    3. The commands mdadm /dev/md127 --fail /dev/sdc and mdadm /dev/md127 --remove /dev/sdc were already reviously run, but I ran them again to ensure the drive was removed.
    4. When I run the command mdadm --add /dev/md127 /dev/sdc I cannot add the drive with the following error:


    Code
    root@openmediavault:~# mdadm --add /dev/md127 /dev/sdc
    mdadm: add new device failed for /dev/sdc as 4: Invalid argument


    The mdadm --detail /dev/md127 command is as follows, and displays that the Superblock data knows there are 4 devices, but only 3 are active. I cannot figure out the command to remove the "removed" device from the Superblock data. So thus I'm stuck unable to add a replacement device with the error message mdadm: add new device failed for /dev/sdc as 4: Invalid argument


    If anyone can chime in on the process of removing the "removed" drive and adding a new one, I would greatly appreciate any help! Thanks in advance to anyone who can assist!

    Krisbee  geaves

    • Official Post

    I've allowed the array to finish syncing and the active (auto-read-only) property has disappeared.

    Well that's a first

    If anyone can chime in on the process of removing the "removed" drive and adding a new one

    You can't, all mdadm is telling you is that it has removed a drive from the array, the --detail switch is giving you information


    mdadm: add new device failed for /dev/sdc as 4: Invalid argument

    :/ if 25% secure wipe did not work I would suggest running a secure wipe on the whole drive then try again


    Adding a drive can be done from the GUI, Raid Management, select the array and click recover on the menu and follow the on screen instructions

  • Krisbee and geaves thank you both, kindly, for the input you've provided thus far.

    I'm preparing to rebuild the array, but have a few questions as I am not new to tech, but this is my first time repairing a completely missing drive.

    Can I mount the filesystem before trying a RAID rebuild, to recover some of the stuff that I know isn't backed up anywhere else? This is just a precaution in case the array completely dies on recovery. It's probably or 10% or 15% of the entire array's volume that I'd be focused on retrieving. When I tried to mount it in the WebUI, I got an error that I'll post at the bottom...

    Here is my preparation to start the recovery, anything else I should know/ do ?

    • I have the original /dev/sdc (a Seagate ST6000NM0014) reaching the end of its secure wipe routine.
    • I installed a "spare" drive as /dev/sdf (a Seagate ST6000NM0034) currently being secure-wiped too, just in case the original drive won't work.
    • Both are pulled from the actual chassis and are arranged in a way to to have extra fans placed on them, dropped the temperature from 38c to 30c, as I'm sure the rebuilding of the array will warm things up a little and... well, the drives have been healthy for a long time, but they're high-mileage units.


    "clean, degraded" array mount error: Looks like the superblock is unreadable. Is that normal or repairable?

  • UPDATE 1/30
    The /dev/sdc (original) and the /dev/sdf (spare I have from different machine) both finished "Secure Wipe" routines. When I try to add /dev/sdc to the array as a replacement using the WebUI, it returns this error message:


    Anyone have any further ideas? I found a thread from another forum where someone battled a very similar issue, but using only mdadm, as whatever system he was using was NOT OMV. I'll add a screenshot of his solution and the URL to the forum below the code box. Is his solution safe for me at all?


    Code
    OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; mdadm --manage '/dev/md127' --add /dev/sdc 2>&1' with exit code '1': mdadm: add new device failed for /dev/sdc as 4: Invalid argument in /usr/share/php/openmediavault/system/process.inc:197
    Stack trace:
    #0 /usr/share/openmediavault/engined/rpc/raidmgmt.inc(419): OMV\System\Process->execute()
    #1 [internal function]: Engined\Rpc\RaidMgmt->add(Array, Array)
    #2 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
    #3 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('add', Array, Array)
    #4 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('RaidMgmt', 'add', Array, Array, 1)
    #5 {main}

    The solution to another RAID5 disaster (like this one), and the URL

    Basically it looks like he carefully and manually rebuilt the array and superblock data with a few (precise) commands.


    https://www.linuxquestions.org/questions/linux-server-73/mdadm-error-replacing-a-failed-disk-909577/


  • LukeR1886 It's not a disaster, yet. But please DO NOT use any mdadm create command as per the link you posted at this stage.


    Wait untill geaves comes along and confirms what I think. The failure to mount the array device means you need to do a filesystem check on the array device which with luck will reconstruct the necessary superblock info. Your degraded array should then be mountable and you will see what data can be retrieved.

  • Krisbee Thank you for the reassurance. We'll see how this goes. Do you think it would be wise to run fsck at this time, so it can run while I'm at work? Or shall we just wait for further approval before doing anything of the sort?

    votdev I don't think the cat /proc/mdstat or the mdadm --detail /dev/md127 have changed much (if any) using a few reassemble commands after the initial post, but here goes a new run of each command this morning:

    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active raid5 sdb[1] sde[3] sdd[2]
          17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
          bitmap: 1/44 pages [4KB], 65536KB chunk
    
    unused devices: <none>

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!