Help request, Slow RAID 5 "Reshape"

  • Hi All. I've hit an issue and I'm very concerned for my family's data.

    Intro:
    I just bought another 4 TB Ironwolf NAS to add to my system. I already have 4x4TB in RAID 5. When I added the drive and hit expand in the OMV GUI, I noticed that the speed was extremely slow - about 20 to 30Kb/s!! This projected many, many months to compelte. I let this run for 24 hours monitoring it, became concerned only after 24 hours of 23Kb/s. Entry under "SW Raid" has dissapeared, although this command shows progress / existence;

    Code
    cat /proc/mdstat 



    Setup;
    4 (now 5) 4TB drives. 4x are on Motherboard Sata ports, 1x on LSI HBA IT mode. LSI HBA card is in a PCIe 3.0 x 16 slot.
    CPU is about 30% utilized, ram 20% or so
    IOWait% was 60% or so.
    Mobo = B450M-HDV R4.0


    Things Tried:
    - I checked write-cache is enabled for all drives in RAID
    - I have tried all the improvement steps here - no improvements (https://www.cyberciti.biz/tips…resync-rebuild-speed.html)
    - I have tried the steps this OMV forum user used (boot to systemrescuecd) - intital rapid speeds, but then settles in at 200 to 300K/Sec (better but still really terrible!) (RAID 5 growing error)

    Comments:
    - at 200/300Kb/s it projects 149 days, during which time I have no NAS storage, no portainer containers etc

    - To my horror, my backus NAS running Rsync has not been backing up so all my valuable data is at risk (at least until this completes!)

    Questions:
    - Has anyone experienced such slow speeds? Reading online my speeds seem really bad. What else can I try?

    - If I remove the newly added drive, will I risk losing data? Its at 8.6% completion (was fast in the beginning, is now beyond slow)

  • Software raid has re-appeared in GUI, and although I can see my SMB folder share names, I cannot access my files yet.

    What's odd is the IO demands on the drives don't sound that demanding - so suprised I cannot use SMB folders whilst it "reshapes".

    I think having portainer / SMB working whilst it reshapes would set my mind at ease, even if it takes weeks

    root@openmediavault:~# cat /proc/mdstatPersonalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]md0 : active raid5 sdd[3] sdb[5] sdc[2] sda[6] sde[4] 11720658432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] [=>...................] reshape = 8.6% (336347100/3906886144) finish=108203.7min speed=549K/sec bitmap: 15/30 pages [60KB], 65536KB chunk

  • One more follow-up thought. Does the filesystem usually get un-mounted during a "reshaping"?

    md0 is neither available nor mounted during the RAID expansion, in the GUI. it contains my data but also portainer config files, which explains why portainer is down too perhaps.

    Is it safe to re-mount, md0 during RAID re-shaping? I thought it was meant to happen in the background, perhaps I'm wrong.

  • ks09aao

    Added the Label OMV 6.x
    • New
    • Official Post

    - I have tried all the improvement steps here - no improvements (https://www.cyberciti.biz/tips…resync-rebuild-speed.html)
    - I have tried the steps this OMV forum user used (boot to systemrescuecd) - intital rapid speeds, but then settles in at 200 to 300K/Sec (better but still really terrible!) (RAID 5 growing error)

    Then this has to be a hardware issue, either one of the sata ports, hba card, cables, drives, you need to find what the read/write of each drive this could narrow down the issue. The OMV link I provide the user with, a suggestion to try nmon on the first page of that thread, nixCraft from the first link provides specific switches to use with iostsat

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • Thanks so much geaves.

    So I've got an nmon screengrab here - looks like one of my 4 CPU cores is working hard, but there is next to no activity on the disks

    I'm not sure where to go from here. This suggests data isn't being written to the disks?

    I have had a go at IOstat and attached a screengrab here as well. Seems really low activity

  • dev/sdb is the new NAS drive, as it is empty and being addded to the RAID array I guess I was expecting low reads and higher writes

    Is my assumption wrong, do you think sdb is faulty based on the values?

  • So if I run "nmon -d" I get an error that states "nmon: option requires an argument -- 'd'", perhaps I misunderstood how to do this.


    If I run "nmon", and then press d to view disks, I get the attached view - no activity at all

  • Yes, in the current config I am trying, all 5 drives are attached to the HBA

    I've also tried

    - 4 drives on motherboard sata, 1x new HDD on HBA

    - 4 drives on motherbaord sata, 1x old HDD on HBA

    All these combinations still result in the high CPU load and no disk activity

  • Yes, I had no issues at all prior to this. Although, this user seems to have had something similar (https://www.spinics.net/lists/raid/msg63372.html)

    Two of my old drives show no smart errors in GUI, but they do show identical "bad blocks" - again like that user did in the link above. I don't exactly follow what he did to fix his though, I think he froze reshaping, grew the array and continued reshaping?

    My drives:



    Next Steps;
    - Getting a new PCI Sata card today, will test with that - will rule out the HBA as a cause
    - Will try swapping the existing sata cables around, although they have been there awhile without issue

    - Will try other power rails of my PSU


    Questions
    - If I remove the newly added drive, will the array be able to recover some / all of my data? Is there a way to "reverse" the reshaping that has occured?
    - Any other troubleshooting you might suggest, should I get a new motherboard? Or do you think its the false-bad-blocks as above?

    • New
    • Official Post

    Two of my old drives show no smart errors in GUI, but they do show identical "bad blocks"

    Have you been running scheduled smart tests on each drive, all mine are set to run a short self test at least once a month, this alone can give valuable information even if it doesn't show in the GUI

    again like that user did in the link above. I don't exactly follow what he did to fix his though, I think he froze reshaping, grew the array and continued reshaping

    I've read through the link and I can't work it out either, but a grow command will reshape an array anyway I don't see how you can stop a reshape but initiate a grow.


    If I remove the newly added drive, will the array be able to recover some / all of my data?

    I have been going over this mentally, TBH I don't, you could shutdown, remove the drive, restart the server and the array could come back as inactive with the 4 drives, it may then be possible to assemble the array and it may come back as clean/degraded

  • Have you been running scheduled smart tests on each drive, all mine are set to run a short self test at least once a month, this alone can give valuable information even if it doesn't show in the GUI

    I confess I haven't - not sure why I missed this feature in OMV. I've just been checking in the GUI now and then to make sure everything looks ok. I will for sure be enabling that once I'm up and running again.

    I just tried all new sata cables, no joy. Will try the SATA PCI card, praying it's that - all my family photos etc were recently put on this server, and I had no idea my backup OMV server Rsync had broken. Gutted.

    Side note- It might help me to understand what mdadm reshape is looping / stuck on exactly. But no errors in "syslog" in OMV.

    If the "rebuild" is hanging due to those false-positive bad-blocks - do you think the below plan is a good / bad idea?

    Inspect which 2 drives show as "bad blocks"

    sudo mdadm --examine-badblocks /dev/sdx


    Pause the reshape - I think?

    sudo mdadm --wait/dev/md0


    Tell mdadm to ignore those drives "bad blocks":

    sudo mdadm --zero-superblock /dev/sda

    sudo mdadm --zero-superblock /dev/sde


    CONTINUE

    sudo mdadm --continue/dev/md0

    • New
    • Official Post

    Tell mdadm to ignore those drives "bad blocks":

    sudo mdadm --zero-superblock /dev/sda

    sudo mdadm --zero-superblock /dev/sde

    You can't do that on two drives within the same array

    Inspect which 2 drives show as "bad blocks"

    sudo mdadm --examine-badblocks /dev/sdx

    In essence the bad blocks appear to have been replicated


    Have a look at this I have heard of this option to use a no bad block


    Whatever you do you cannot try, test, use an option on more than one drive within the array, I still think my last option in #14 might be worth a shot.

  • Updates:

    - Tried PCI Sata card, no differences, tried both PCI ports.

    - Just tried booting with the new HDD not attached - nothing in Software RAID OMV GUI.

    - with 4 drives in, I do get the below from command line - not sure why it says raid 0 to raid 5.



    What should I do now do you think - is this encouraging, or not because nothing appeared in the GUI

    • New
    • Official Post

    What should I do now do you think - is this encouraging, or not because nothing appeared in the GUI

    Hm, that's good, but line 9 tells you the problem, the array is inactive, so that's encouraging,


    OK -> mdadm --stop /dev/md0 you should get mdadm stopped as the output


    mdadm --assemble --force --verbose /dev/md0 /dev/sd[abcd]

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!