RAID 5 Issues

  • Hi all,


    It's been awhile since I have been on here. Which I think it a good thing since my server has been running really well up until about 4-5 days ago.

    I started to receive errors on a drive from SMART. The drive in question is a 3TB drive in a RAID 5 with 4 other drives (5 in total). I purchased a new drive to replace it and on top of that an additional larger drive 10TB to backup everything from the RAID to it before I replace the bad drive. At this point the drive is still showing that its good and the RAID is showing as clean with all 5 drives.


    Today I started to setup an rsync to copy all the data from the RAID to the 10TB drive using an external enclosure USB3. I didn't even start transfering the files and was just setting up the rsync that I received an email saying:

    Code
    Status failed Service mountpoint_srv_dev-disk-by-label-Main
    Date: Wed, 20 Jan 2021 21:54:45
    Action: alert
    Host: Morpheus-SAWHOME
    Description: status failed (1) -- /srv/dev-disk-by-label-Main is not a mountpoint
    Your faithful employee,
    Monit

    I took a look at the RAID and it's now showing as degraded/clean. One of the drives are offline. So okay the drive failed. But usually when a drive fails the RAID still works in degraded mode and it would remain mounted. On OMV it's listed in the filesystem, but it's not mounted. I tried to mount it and I get this error

    Code
    Error #0:
    OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; mount -v --source '/dev/disk/by-label/Main' 2>&1' with exit code '32': mount: /dev/md127: can't read superblock in /usr/share/php/openmediavault/system/process.inc:182
    Stack trace:
    #0 /usr/share/php/openmediavault/system/filesystem/filesystem.inc(733): OMV\System\Process->execute()
    #1 /usr/share/openmediavault/engined/rpc/filesystemmgmt.inc(920): OMV\System\Filesystem\Filesystem->mount()
    #2 [internal function]: OMVRpcServiceFileSystemMgmt->mount(Array, Array)
    #3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
    #4 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('mount', Array, Array)
    #5 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('FileSystemMgmt', 'mount', Array, Array, 1)
    #6 {main}

    Here's the output of mdadm --detail /dev/md127

    1 drive is removed (the failed one).

    Any idea how I can mount it and get it up and online so I can backup the files before I rebuild the RAID with the new 3TB drive?

    Would mdadm --assemble --force --verbose /dev/md127 /dev/sd[cadg] work or destroy the current RAID, because the 5th drive is missing?


    Just trying to save the data if possible.


    Thanks in advance!

  • Sorry I just saw a post from ryecoaaron on what to post if we have RAID issues.

    Here's the info from SSH:

    cat /proc/mdstat

    Code
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active raid5 sdg[6] sda[5] sdd[4] sdc[1]
    11720540160 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [_UUUU]
    unused devices: <none>

    blkid

    Code
    /dev/sda: UUID="bea4cede-6989-d555-c003-7b765d3428cb" UUID_SUB="924f1fae-8897-f653-bbd0-7374d8b216df" LABEL="SAWHOME-Vault:RAID5" TYPE="linux_raid_member"
    /dev/sdb1: UUID="0ded74f0-7fe8-4e6f-ae0a-bb98de0284e3" TYPE="ext4" PARTUUID="c71662bd-01"
    /dev/sdb5: UUID="233be14a-5a18-40ef-b871-b8f2a38c3088" TYPE="swap" PARTUUID="c71662bd-05"
    /dev/md127: LABEL="Main" UUID="dd3de295-9705-4573-b299-53e77a01fada" TYPE="xfs"
    /dev/sdc: UUID="bea4cede-6989-d555-c003-7b765d3428cb" UUID_SUB="bdcc2091-4d65-2715-0001-350c78f8931c" LABEL="SAWHOME-Vault:RAID5" TYPE="linux_raid_member"
    /dev/sdd: UUID="bea4cede-6989-d555-c003-7b765d3428cb" UUID_SUB="8b1fd939-c527-78b6-7e40-2467f5ddab6b" LABEL="SAWHOME-Vault:RAID5" TYPE="linux_raid_member"
    /dev/sde1: UUID="c4ef0497-8294-4412-baf7-dc821a75cadf" TYPE="ext4" PARTUUID="27cdaae8-1a18-4005-aa8f-d11bdd2eabbc"
    /dev/sdf1: UUID="673e2ffb-0aec-4f53-93fe-4b3ef987e8bc" TYPE="ext4" PARTUUID="32946d5b-5a96-4f14-b989-86d6626b1a7d"
    /dev/sdg: UUID="bea4cede-6989-d555-c003-7b765d3428cb" UUID_SUB="bd722057-fcb3-dcfd-114f-341bf166667f" LABEL="SAWHOME-Vault:RAID5" TYPE="linux_raid_member"

    fdisk -l | grep "Disk "

    The drives in the RAID are the 2.7TB drives. There should be 5 of them, but 1 failed and only 4 are showing.


    cat /etc/mdadm/mdadm.conf

    mdadm --detail --scan --verbose

    Code
    ARRAY /dev/md127 level=raid5 num-devices=5 metadata=1.2 name=SAWHOME-Vault:RAID5 UUID=bea4cede:6989d555:c0037b76:5d3428cb
    devices=/dev/sda,/dev/sdc,/dev/sdd,/dev/sdg


    Something else I noticed. The RAID would mount during a reboot of the system. Then seconds after it mounts is drops again. When it does mount for the 20-30 seconds, I can see my data while the RAID is in degraded mode.


    Thanks in advance!

  • Post the output of mdadm --examine /dev/sdX where X is the drive reference a c d g you're looking at the output for Events:


    BTW before you can do assemble using mdadm from the cli you have to stop the array mdadm --stop /dev/md?

    Raid is not a backup! Would you go skydiving without a parachute?

  • sda

    sdc

    sdd

    sdg


    Here you go. Should I stop the array now, or wait until it's time to assemble?

  • Here you go. Should I stop the array now, or wait until it's time to assemble

    Well according to the output there's nothing erroneous that should prevent the raid from mounting, but to answer your question you would have to stop the array first then assemble.


    But you're stuck between a rock and a hard place, I can understand why you want to back up before replacing the drive, but you still run the risk of another drive falling over either way. Plus the raid access will be slower in it's degraded state.

    Raid is not a backup! Would you go skydiving without a parachute?

  • So I stopped the array and ran this command

    mdadm --assemble --force --verbose /dev/md127 /dev/sda /dev/sdc /dev/sdd /dev/sdg

    This is what happened

    Nothing out of the ordinary and all looks good. So I tried to mount it again from within OMV and I get this error.

    Code
    Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; mount -v --source '/dev/disk/by-label/Main' 2>&1' with exit code '32': mount: /dev/md127: can't read superblock

    Details of the error

    Code
    Error #0:
    OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; mount -v --source '/dev/disk/by-label/Main' 2>&1' with exit code '32': mount: /dev/md127: can't read superblock in /usr/share/php/openmediavault/system/process.inc:182
    Stack trace:
    #0 /usr/share/php/openmediavault/system/filesystem/filesystem.inc(733): OMV\System\Process->execute()
    #1 /usr/share/openmediavault/engined/rpc/filesystemmgmt.inc(920): OMV\System\Filesystem\Filesystem->mount()
    #2 [internal function]: OMVRpcServiceFileSystemMgmt->mount(Array, Array)
    #3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
    #4 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('mount', Array, Array)
    #5 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('FileSystemMgmt', 'mount', Array, Array, 1)
    #6 {main}

    I am at a loss... Any further assistance would be appreciated.

  • I am at a loss... Any further assistance would be appreciated

    There is one option but I have reservations due to the fact that the raid is in a degraded state and that was doing a search based upon the error; exit code '32': mount: /dev/md127: can't read superblock


    Added to that you have formatted the raid using xfs this requires a different approach than the standard ext4 commands, from what I have found you will need to run xfs_check <path to raid array> the location of the full path will be under /srv that will tell you if there are any issues with the file system before proceeding with xfs_repair.


    Word of warning whatever happens, whatever you do from now on could result in loss of data!!

    Raid is not a backup! Would you go skydiving without a parachute?

  • I'm fine with the fact that I may lose the data. Nothing on there is really that important and if it's lost I have backups of the important stuff.


    So does this look correct?

    xfs_check dev-disk-by-label-Main/dev/md127 

    or should it be

    xfs_check /dev/md127/dev-disk-by-label-Main

    or just

    xfs_check /dev-disk-by-label-Main


    Just want to make sure I use the correct command before I try it. Once I get that sorted out, what would be the full command for the xfs_repair?


    I think once I get this sorted out I might try using SnapRaid. Seems like I would have better success with that setup for a media server.

    I appreciate your help.

  • Here's the update

    I took the plunge and since I wasn't too worried about losing the data I went ahead and tried some options online to get this RAID mounted.


    Here's what I did:

    Ran sudo xfs_repair -L /dev/md127 I had to use the -L to delete the logs. I figured they must be corrupt since I couldn't mount the RAID.

    I did get an IO error.

    So I tried this command without the -L since the logs were deleted in the previous command I ran.

    sudo xfs_repair /dev/md127

    It took awhile and didn't come back without any errors.

    I went into OMV and found the filesystem and mounted it. It is now mounted and I have access to my files.


    Thank you for your assistance in this issue. I will perform a back-up of the files and then try SnapRaid moving forward. Seems like it would be a better option for me.


    Thanks again.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!