Raid 5 Grow Failure - ?How to Stop it?

  • Hi.


    I have searched the forum and been scouring the internet for weeks and am stuck.


    I am trying to grow my Raid 5 from 7 to 8 drives. The Reshaping has never progressed from 0.0% complete and is processing at 0K/second. The file system is not mounted during this entire time. The 7 drive array was working great before I tried to add another drive to it. I was just getting close to running out of space and wanted to grow it.


    I need the commands to stop the grow / reshaping and just put the array back to the 7 drives. I have rebooted the server several times and it always goes back to this state.


    drive /dev/sdi is the newest drive.


    Every time the process gets stopped on a random ### blocked for more than ### seconds. It has been sitting for 2 weeks now and nothing.


    Yes the md0/raid5 is eating up 100% of my CPU.


    Please help...I have tried my best and just want to get my array back online with the data that is on the drives.


    Thanks in advance if you can help me.


    OMV 6

    2008 MacPro Cheese Grater chassis


    Code
    root@doghouse:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md0 : active raid5 sdg[1] sdf[7] sda[4] sdh[2] sdb[8] sdc[3] sdd[6] sdi[5]
          82033511424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [8/8] [UUUUUUUU]
          [>....................]  reshape =  0.0% (58364/13672251904) finish=10209618072.6min speed=0K/sec
          bitmap: 13/102 pages [52KB], 65536KB chunk
    
    unused devices: <none>


  • OK so I didn't want to answer earlier because there could be a bunch of reasons... but let me tell you what I would do in this situation:


    1. Make absolutely sure you have a backup, RAID operation are always dangerous. But in a case like this where something clearly went wrong this is even more important.


    If you don't have a backup already make one now. You said that you unmounted the filesystem, this is actually not necessary for RAID operations. So in case you don't have a backup I would mount the file system and make a backup. (Yes mount it in the "reshaping" state that it is currently in.)


    2. Now you said it got stuck on the reshape, so I understand that you did add the drive to the raidarray but did not yet us the,

    • mdadm --grow /dev/md0 --array-size []

    command

    • or used the "grow" button on OMV.

    Please tell me if this assesment is wrong.


    3. The next step I would take is to try to remove the drive you added earlier to the array again.


    In the web-interface the action would be: Storage -> Software RAID -> selct the array in quesiton -> hit "remove" on the top -> select the appropriate drive -> confirm.


    If this is locked (greyed out) (maybe because a reshape is in action) try to do it over the CLI. The commands would be:

    • mdadm --manage /dev/mdX --fail /dev/sdX
    • mdadm --manage /dev/mdX --remove /dev/sdX

    If any of the commands fail maybe try with a -f at the end to force it. (Maybe because a reshape is happening right now)


    Note that this is the most dangerous part as I don't know what state the array is in!!! There was clearly some sort of error but I don't know what. Usually I would never do something like this during a reshape but if you already tried everything else maybe this is the only way to go. Make sure you have a backup as I don't know what will happen to the array.

    (This is also why I didn't want to give an answer at first, because I didn't want to be the guy that gave you instructions to brick you RAID array...)


    4. Now check the state of the RAID array, I think it should report:

    • Raid devices = 8
    • Total devices = 7

    If any sort of RAID process (like restore or reshape) started automaticaly wait for this process to finish, with this many drives this might take a loooong time (also depending on the size).


    5. Now you want to set the raid devices to 7 again, the command is:

    • mdadm --grow /dev/mdX --raid-devices=7

    Wait for any sort of raid opeartions (like a restore or a reshape) to finish.


    5.1 Considering that you used the omv "grow" button the array might already have been grown. (Even though it doesn't look like it from the status).

    If there is an error of the sort: "this change will reduce the size of the array. use --grow --array-size first to truncate array. e.g. mdadm --grow /dev/md0 --array-size 3906767872" use the:

    • mdadm --grow /dev/md0 --array-size 3906767872

    command. Just use the size that the output suggested in the command output.

    After doing that,

    • mdadm --grow /dev/mdX --raid-devices=7

    should just work.


    6. Secure erase your drive, I have heard that drives that were once in a raid array and are used in a raid array again at a later time "could" cause catastrophic erros if not secure erased.


    7. Try to add the same drive again.



    If anything doesn't work as planned I am happy to help but I can't guarantee for anything...

    Einmal editiert, zuletzt von elio_ () aus folgendem Grund: Added step 5.1

  • ....thanks so much for at least a place to start.

    Zitat

    command

    • or used the "grow" button on OMV.

    You are correct, I pressed the grow button in OMV.


    Zitat

    (This is also why I didn't want to give an answer at first, because I didn't want to be the guy that gave you instructions to brick you RAID array...)


    At this point my server is sitting there not getting used because the array reshape has just locked the whole thing up completely. So I either need to start over or something. two of my drives have Warnings...but not complete Failure. So if I do get it working again I will probably swap those out one at a time and then try to grow it again....that is a big if.


    Thanks to anyone who takes the time to help out. This is a topic I haven't been able to track down anywhere. Sometimes I feel like I know what I am doing and some times I don't....

  • Hmm so while I don't know what kind of warnings the drives have there might absolutely be a correlation with the stuck reshape... which would be... not great. Especially with a RAID 5 array which is not recomended at all for this many drives anyways.


    1. Do you have a backup? If so restoring could be a faster (du to the unbearably long restore operations of RAID) and also less stressfull option considering that I don't know what the odds of success are in you case. (But then again if you already have a backup you might as well try)

    At this point my server is sitting there not getting used because the array reshape has just locked the whole thing up completely.

    2. In case you don't have a backup:

    So do you mean the RAID operation is locked up (the reshape in this case) or can you not mount and use the array at all?

    If you can mount the array and use it you should absolutely try to make a backup (at least of the important stuff).

    I know that you might not have the storage necessary considering you have an array of over 80TB but in this situation I don't know if fixing the array is going to work out especially with 2 drives having some sort of errors so a backup is crucial.


    3. Concerning your drives with errors: I know this doesn't help but statisticaly drives (especially ones that are failing) fail quite often during RAID operations due to the the high load on them.


    4. You said that the array is "locked up" well there isn't per se a command to stop the reshape just like there isn't a command to start the reshape. Rather the reshape just starts after adding the drive and the only way I can think of is to undo those steps and hope the array doesn't get bricked by removing the drive or by subsequent drive failures as the array will certanly be in a "degraded" state.


    5. Once you have a backup I would try to start with my steps.


    I also made a edit to the post above (Step 5.1)


    Maybe some other people also have Ideas what to do...

    • Offizieller Beitrag

    ssh into omv as root and try this command echo max > /sys/block/md0/md/sync_max


    But in all honesty 8 drives in a raid 5 is suicidal, you know 2 drives have bad sectors and have failed to replace those, there's only one way this can can got and that's tits up!!


    The fact that the rebuild appears to have halted, it could be 'stuck' on a bad sector of one of those drives.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • ssh into omv as root and try this command echo max > /sys/block/md0/md/sync_max


    I'm not sure why this worked, but it saved my hide...


    I tried to change bitmap to none after using the grow command like a big dumb dumb. This got things running.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!