How to cancel / revert a reshape?

  • I'm replacing my old disks, at the last I have done a huge mistake. Happens when you are still too tired and doing stuff.


    I have triggered a grow via the surface, instead of a replace.


    How can i stop this reshape and revert it back?



    cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active raid5 sdg[7] sdh[5] sdd[2] sdf[6] sde[4]

    23441685504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

    [>....................] reshape = 0.6% (51202484/7813895168) finish=3417.2min speed=37859K/sec

    bitmap: 5/59 pages [20KB], 65536KB chunk


    unused devices: <none>



    blkid

    /dev/sdf: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="7c8428b5-d1d3-2645-41be-01aece96be80" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/sdd: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="0bb3a138-a123-dba9-46d3-e918f8727c49" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/md127: LABEL="bigdata" UUID="e7908853-9d37-4707-934a-0300df111826" BLOCK_SIZE="4096" TYPE="ext4"

    /dev/sdb1: UUID="67a1bf27-bc53-48b2-af6b-e51dd0fcac61" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="primary" PARTUUID="cbcfecee-0c1f-4c85-92c0-aa13861a232f"

    /dev/sde: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="b5b2ce80-d427-6053-54ed-df981392f185" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/sdc5: UUID="8fee8f36-648b-4fb6-8eb8-24a650546bc1" TYPE="swap" PARTUUID="185dd29e-05"

    /dev/sdc1: UUID="c27d8b78-4d36-411a-8a60-ad125ccc162d" BLOCK_SIZE="4096" TYPE="ext4" PTTYPE="dos" PARTUUID="185dd29e-01"

    /dev/sda1: UUID="89d35dd0-b414-4027-bd96-e72e5d90c2f1" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="b3f3432d-4419-487d-a86f-445c1640a513"

    /dev/sdh: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="e9eb6a12-697d-d5a2-3032-fbb4de9f3840" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/sdg: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="ca0a8f28-bf4d-fd73-bd8b-caf8e6e21874" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"


    Many tahnks again!

  • Yes. When the current reshape finishes you should have one device marked as a spare in your RAID5 array. As raid6 needs a min of 4 devices you can change from Raid 5 with 5 disks can be changed to a Raid 6 with 5 disks.

  • Hmmm, don't think so at the moment, but let's see. If there's no other way, then I'll just have one more disk in the raid. It wasn't planned that way, but that's how it is.


    Thanks again

  • The reshape is complete.

    cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active raid5 sdg[7] sdh[5] sdd[2] sdf[6] sde[4]

    31255580672 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

    bitmap: 0/59 pages [0KB], 65536KB chunk


    sudo mdadm --detail /dev/md127

    /dev/md127:

    Version : 1.2

    Creation Time : Wed Nov 28 21:03:55 2018

    Raid Level : raid5

    Array Size : 31255580672 (29.11 TiB 32.01 TB)

    Used Dev Size : 7813895168 (7.28 TiB 8.00 TB)

    Raid Devices : 5

    Total Devices : 5

    Persistence : Superblock is persistent


    Intent Bitmap : Internal


    Update Time : Mon Aug 26 18:09:24 2024

    State : clean

    Active Devices : 5

    Working Devices : 5

    Failed Devices : 0

    Spare Devices : 0


    Layout : left-symmetric

    Chunk Size : 512K


    Consistency Policy : bitmap


    Name : openmediavault:raiddata

    UUID : 25a95af2:bc96c52c:12e733a7:eda5b7db

    Events : 969265


    Number Major Minor RaidDevice State

    5 8 112 0 active sync /dev/sdh

    6 8 80 1 active sync /dev/sdf

    2 8 48 2 active sync /dev/sdd

    4 8 64 3 active sync /dev/sde

    7 8 96 4 active sync /dev/sdg




    The System is still showing the old capacity


    So i'm not sure if i can now switch to Raid 6 or if i use the steps from your link.

  • 1) I have set a Disk to Fail and removed the Disk


    2) and executed

    mdadm --zero-superblock /dev/sdd

    because sdd was removed


    3) mdadm told me the the Array is to big:

    sudo mdadm --grow --raid-devices=4 /dev/md127

    mdadm: this change will reduce the size of the array.

    use --grow --array-size first to truncate array.

    e.g. mdadm --grow /dev/md127 --array-size 23441685504


    4)

    so i run

    mdadm --grow /dev/md127 --array-size 23441685504


    5)

    After that:

    cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active raid5 sde[6] sdg[5] sdf[7] sdc[4]

    23441685504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [UU_UU]

    bitmap: 0/59 pages [0KB], 65536KB chunk


    unused devices: <none>


    6) Now running:

    mdadm --grow -n4 /dev/md127 --backup=/mdadm_temp


    Hint:
    sudo mdadm --grow --raid-devices=4 /dev/md127

    Did not work!



    7) reshape is running

    cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active raid5 sde[6] sdg[5] sdf[7] sdc[4]

    23441685504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]

    [>....................] reshape = 0.0% (6856256/7813895168) finish=8694.3min speed=14965K/sec

    bitmap: 0/59 pages [0KB], 65536KB chunk



    But Slow as hell!


    Wish me luck.

    Will update

  • Reshaping complete today ar 18:51

    So ist tooks almost 5,5 days


    but now:


    cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active raid5 sde[6] sdg[5] sdf[7] sdc[4]

    23441685504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]

    [=>...................] recovery = 7.0% (548281024/7813895168) finish=1318.6min speed=91831K/sec

    bitmap: 18/59 pages [72KB], 65536KB chunk


    unused devices: <none>

  • So either one of the drives is genuinely faulty, or has been kicked out of the array for being out of step with the other drives as it has a different update time and/or events count. Once the array is fully recovered you can check the mdadm details and use something like mdadm -E /dev/sd[egfc] | egrep "Update|Events" to examine the individual drives in the array. Then its back to add and replace, or removing an out of step drive from the array, wiping it and then adding it back to the array via WebUI recovery.

  • Recovery just finished

    I was able to see files in the shares. Also, i started comparing some GB of files with local copies.

    I was strange that i was able to see content of a folder over my Windows Laptop but not over WinSCP


    So i updated OMV and done an restart.

    But now everything is gone.



    Here it's empty..


    root@vuke-nas542:~# blkid

    /dev/sdf: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="ca0a8f28-bf4d-fd73-bd8b-caf8e6e21874" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/sdd: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="b5b2ce80-d427-6053-54ed-df981392f185" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/md127: LABEL="bigdata" UUID="e7908853-9d37-4707-934a-0300df111826" BLOCK_SIZE="4096" TYPE="ext4"

    /dev/sdb1: UUID="89d35dd0-b414-4027-bd96-e72e5d90c2f1" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="b3f3432d-4419-487d-a86f-445c1640a513"

    /dev/sdg: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="e9eb6a12-697d-d5a2-3032-fbb4de9f3840" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/sde: UUID="25a95af2-bc96-c52c-12e7-33a7eda5b7db" UUID_SUB="7c8428b5-d1d3-2645-41be-01aece96be80" LABEL="openmediavault:raiddata" TYPE="linux_raid_member"

    /dev/sdc5: UUID="8fee8f36-648b-4fb6-8eb8-24a650546bc1" TYPE="swap" PARTUUID="185dd29e-05"

    /dev/sdc1: UUID="c27d8b78-4d36-411a-8a60-ad125ccc162d" BLOCK_SIZE="4096" TYPE="ext4" PTTYPE="dos" PARTUUID="185dd29e-01"

    /dev/sda1: UUID="67a1bf27-bc53-48b2-af6b-e51dd0fcac61" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="primary" PARTUUID="cbcfecee-0c1f-4c85-92c0-aa13861a232f"



    root@vuke-nas542:~# cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md127 : active raid5 sdd[4] sdg[5] sde[6] sdf[7]

    23441685504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

    bitmap: 0/59 pages [0KB], 65536KB chunk


    unused devices: <none>



    root@vuke-nas542:~# fdisk -l | grep "Disk "

    Disk /dev/sdb: 9,1 TiB, 10000831348736 bytes, 19532873728 sectors

    Disk model: ST10000NM0086-2A

    Disk identifier: 902B1A50-53D9-475C-A8EA-D7688C3B252A

    Disk /dev/sde: 7,28 TiB, 8001563222016 bytes, 15628053168 sectors

    Disk model: WDC WD80EFPX-68C

    Disk /dev/sda: 5,46 TiB, 6001175126016 bytes, 11721045168 sectors

    Disk model: ST6000VN0041-2EL

    Disk identifier: A5EE3626-AB57-40AC-A30F-3DDA7A2B424D

    Disk /dev/sdc: 223,57 GiB, 240057409536 bytes, 468862128 sectors

    Disk model: OCZ-TRION150

    Disk identifier: 0x185dd29e

    Disk /dev/sdf: 7,28 TiB, 8001563222016 bytes, 15628053168 sectors

    Disk model: WDC WD80EFPX-68C

    Disk /dev/sdg: 7,28 TiB, 8001563222016 bytes, 15628053168 sectors

    Disk model: WDC WD80EFPX-68C

    Disk /dev/sdd: 7,28 TiB, 8001563222016 bytes, 15628053168 sectors

    Disk model: WDC WD80EFPX-68C

    Disk /dev/md127: 21,83 TiB, 24004285956096 bytes, 46883371008 sectors



    root@vuke-nas542:~# cat /etc/mdadm/mdadm.conf

    # This file is auto-generated by openmediavault (https://www.openmediavault.org)

    # WARNING: Do not edit this file, your changes will get lost.


    # mdadm.conf

    #

    # Please refer to mdadm.conf(5) for information about this file.

    #


    # by default, scan all partitions (/proc/partitions) for MD superblocks.

    # alternatively, specify devices to scan, using wildcards if desired.

    # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.

    # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is

    # used if no RAID devices are configured.

    DEVICE partitions


    # auto-create devices with Debian standard permissions

    CREATE owner=root group=disk mode=0660 auto=yes


    # automatically tag new arrays as belonging to the local system

    HOMEHOST <system>

    # instruct the monitoring daemon where to send mail alerts

    MAILADDR vuke@gmx.de

    MAILFROM root


    # definitions of existing MD arrays

    ARRAY /dev/md127 metadata=1.2 spares=1 name=openmediavault:raiddata UUID=25a95af2:bc96c52c:12e733a7:eda5b7db



    root@vuke-nas542:~# mdadm --detail --scan --verbose

    ARRAY /dev/md127 level=raid5 num-devices=4 metadata=1.2 name=openmediavault:raiddata UUID=25a95af2:bc96c52c:12e733a7:eda5b7db

    devices=/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg



    root@vuke-nas542:~# mdadm -E /dev/sd[egfc] | egrep "Update|Events"

    Update Time : Mon Sep 2 22:10:53 2024

    Events : 1032224

    Update Time : Mon Sep 2 22:10:53 2024

    Events : 1032224

    Update Time : Mon Sep 2 22:10:53 2024

    Events : 1032224

    root@vuke-nas542:~#




    root@vuke-nas542:~# blkid /dev/md127

    /dev/md127: LABEL="bigdata" UUID="e7908853-9d37-4707-934a-0300df111826" BLOCK_SIZE="4096" TYPE="ext4"

    root@vuke-nas542:~# sudo mkdir -p /mnt/raid

    root@vuke-nas542:~# sudo mount /dev/md127 /mnt/raid

    mount: /mnt/raid: mount(2) system call failed: Die Struktur muss bereinigt werden.

    dmesg(1) may have more information after failed mount system call.

    root@vuke-nas542:~# ^C

    root@vuke-nas542:~# sudo fsck /dev/md127

    fsck from util-linux 2.38.1

    e2fsck 1.47.0 (5-Feb-2023)

    ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap

    fsck.ext4: Group descriptors look bad... trying backup blocks...

    Block bitmap for group 16336 is not in group. (block 140738196992000)

    Relocate<y>?

  • I have since tested the guide athttps://www.stevewilson.co.uk/sysadmin/reducing-raid-5-disks-with-mdadm/ with success. I booted a OMV VM via a systemrescue CD in order to umount a 4 disk raid5 test array with out problems and then followed the instructions. On first reboot into OMV the now 3 disk raid5 test array mounted OK, but there is an inconsistency between cat /etc/mdadm/mdadm.conf and mdadm --detail --scan --verbose You can fix that by running: omv-salt deploy run mdadm initramfs But that's not going to fix your filesystem problem.


    It's hard to know how/when your porblem occurred. I would have been wary of updating OMV until I was statised the mdadm array and associated filesystem was OK after the recovery was complete.


    A filesystem check is part of OMV's normal boot process triggered by the systemd-fsck service. I'd check the logs to see if there are any indications of errors/failures, e.g. a successful check


    Code
    root@omvt:~# journalctl -b -g md0
    Sep 03 11:43:30 omvt kernel: md/raid:md0: device sdc operational as raid disk 2
    Sep 03 11:43:30 omvt kernel: md/raid:md0: device sda operational as raid disk 0
    Sep 03 11:43:30 omvt kernel: md/raid:md0: device sdb operational as raid disk 1
    Sep 03 11:43:30 omvt kernel: md/raid:md0: raid level 5 active with 3 out of 3 devices, algorithm 2
    Sep 03 11:43:30 omvt kernel: md0: detected capacity change from 0 to 41906176
    Sep 03 11:43:30 omvt systemd-fsck[444]: /dev/md0: clean, 19/1310720 files, 2648520/5238272 blocks
    Sep 03 11:43:31 omvt kernel: EXT4-fs (md0): mounted filesystem with ordered data mode. Quota mode: journalled.
    root@omvt:~# 


    In your case you might see a message re: RUN fsck manually. Running a manual fsck on the array filesystem may or may not fix it. If you need it, here's a command reference: https://phoenixnap.com/kb/fsck-command-linux.




  • I decided to start with a fresh Raid, so I'm trying to remove all SMBs, SharedFoldes and the to recreate the Raid, Fileystem and the Shares Again.

    But at the moment i'm not able to delete everything old :(

  • Yes, the current situation drives me crazy. I also no longer have any trust in integrity. I have checked all the disks with a Long Smart Test. Everything fits so far. I'd rather start all over again now than somehow try to save the data and then something doesn't fit after all.


    I know that setting up the raid and the shares and the SMB shares and especially the copying will take a long time, but I should have done it this way from the beginning.

  • Your case illustrates some of the "cons" in using mdadm. Extremely long reshape/resync times in some instances. Risk of filesystem corruption where fsck may or may not fix problems. Complex commands and not all administrative errors can be reverted.


    Judging by the number of threads on the forum, when disks get kicked out of mdadm arrays for various reasons their recovery is not always guaranteed and can leave the filesystem in a consistent state, possibly compounded by incorrect use of mdadm commands in attempting to fix things.


    You raised the question of "data integrity". The so-called write hole associated with power loss failures during mid-write to parity based RAID, is one problem. You should scrub mdadm arrays at regular intervals, in the case of parity raid the questions is what type of error can be detected and can it be fixed?

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!