Help request, Slow RAID 5 "Reshape"

ks09aao · 8. Mai 2024

Hi All. I've hit an issue and I'm very concerned for my family's data.

Intro:
I just bought another 4 TB Ironwolf NAS to add to my system. I already have 4x4TB in RAID 5. When I added the drive and hit expand in the OMV GUI, I noticed that the speed was extremely slow - about 20 to 30Kb/s!! This projected many, many months to compelte. I let this run for 24 hours monitoring it, became concerned only after 24 hours of 23Kb/s. Entry under "SW Raid" has dissapeared, although this command shows progress / existence;

Code

cat /proc/mdstat

Setup;
4 (now 5) 4TB drives. 4x are on Motherboard Sata ports, 1x on LSI HBA IT mode. LSI HBA card is in a PCIe 3.0 x 16 slot.
CPU is about 30% utilized, ram 20% or so
IOWait% was 60% or so.
Mobo = B450M-HDV R4.0

Things Tried:
- I checked write-cache is enabled for all drives in RAID
- I have tried all the improvement steps here - no improvements (https://www.cyberciti.biz/tips…resync-rebuild-speed.html)
- I have tried the steps this OMV forum user used (boot to systemrescuecd) - intital rapid speeds, but then settles in at 200 to 300K/Sec (better but still really terrible!) (RAID 5 growing error)

Comments:
- at 200/300Kb/s it projects 149 days, during which time I have no NAS storage, no portainer containers etc

- To my horror, my backus NAS running Rsync has not been backing up so all my valuable data is at risk (at least until this completes!)

Questions:
- Has anyone experienced such slow speeds? Reading online my speeds seem really bad. What else can I try?

- If I remove the newly added drive, will I risk losing data? Its at 8.6% completion (was fast in the beginning, is now beyond slow)

ks09aao · 8. Mai 2024

Software raid has re-appeared in GUI, and although I can see my SMB folder share names, I cannot access my files yet.

What's odd is the IO demands on the drives don't sound that demanding - so suprised I cannot use SMB folders whilst it "reshapes".

I think having portainer / SMB working whilst it reshapes would set my mind at ease, even if it takes weeks

root@openmediavault:~# cat /proc/mdstatPersonalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]md0 : active raid5 sdd[3] sdb[5] sdc[2] sda[6] sde[4] 11720658432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] [=>...................] reshape = 8.6% (336347100/3906886144) finish=108203.7min speed=549K/sec bitmap: 15/30 pages [60KB], 65536KB chunk

ks09aao · 8. Mai 2024

One more follow-up thought. Does the filesystem usually get un-mounted during a "reshaping"?

md0 is neither available nor mounted during the RAID expansion, in the GUI. it contains my data but also portainer config files, which explains why portainer is down too perhaps.

Is it safe to re-mount, md0 during RAID re-shaping? I thought it was meant to happen in the background, perhaps I'm wrong.

geaves · 9. Mai 2024

Zitat von ks09aao

- I have tried all the improvement steps here - no improvements (https://www.cyberciti.biz/tips…resync-rebuild-speed.html)
- I have tried the steps this OMV forum user used (boot to systemrescuecd) - intital rapid speeds, but then settles in at 200 to 300K/Sec (better but still really terrible!) (RAID 5 growing error)

Then this has to be a hardware issue, either one of the sata ports, hba card, cables, drives, you need to find what the read/write of each drive this could narrow down the issue. The OMV link I provide the user with, a suggestion to try nmon on the first page of that thread, nixCraft from the first link provides specific switches to use with iostsat

ks09aao · 9. Mai 2024

Thanks so much geaves.

So I've got an nmon screengrab here - looks like one of my 4 CPU cores is working hard, but there is next to no activity on the disks

I'm not sure where to go from here. This suggests data isn't being written to the disks?

I have had a go at IOstat and attached a screengrab here as well. Seems really low activity

geaves · 9. Mai 2024

Zitat von ks09aao

I have had a go at IOstat and attached a screengrab here as well. Seems really low activity

From the iostat output look at /dev/sdb compared to the others

ks09aao · 9. Mai 2024

dev/sdb is the new NAS drive, as it is empty and being addded to the RAID array I guess I was expecting low reads and higher writes

Is my assumption wrong, do you think sdb is faulty based on the values?

geaves · 9. Mai 2024

what's the output if you run nmon -d

ks09aao · 9. Mai 2024

So if I run "nmon -d" I get an error that states "nmon: option requires an argument -- 'd'", perhaps I misunderstood how to do this.

If I run "nmon", and then press d to view disks, I get the attached view - no activity at all

geaves · 9. Mai 2024

Zitat von ks09aao

If I run "nmon", and then press d to view disks, I get the attached view - no activity at all

Ah, TBH it's a long time since I've used this

Is sdb attached to the HBA

ks09aao · 9. Mai 2024

Yes, in the current config I am trying, all 5 drives are attached to the HBA

I've also tried

- 4 drives on motherboard sata, 1x new HDD on HBA

- 4 drives on motherbaord sata, 1x old HDD on HBA

All these combinations still result in the high CPU load and no disk activity

geaves · 9. Mai 2024

Zitat von ks09aao

All these combinations still result in the high CPU load and no disk activity

Then it has to be a drive, cables, even power, or it's related somehow to the HBA as I'm assuming you had no issues prior to installing that

ks09aao · 9. Mai 2024

Yes, I had no issues at all prior to this. Although, this user seems to have had something similar (https://www.spinics.net/lists/raid/msg63372.html)

Two of my old drives show no smart errors in GUI, but they do show identical "bad blocks" - again like that user did in the link above. I don't exactly follow what he did to fix his though, I think he froze reshaping, grew the array and continued reshaping?

My drives:

Code

Bad-blocks on /dev/sda:
672813792 for 72 sectors
5603920904 for 48 sectors
5661217456 for 40 sectors
7032769024 for 88 sectors
7032769160 for 16 sectors
7032769248 for 8 sectors
7032769312 for 24 sectors
7064954000 for 56 sectors
7064954256 for 8 sectors
root@openmediavault:~# mdadm --examine-badblocks /dev/sdb
Bad-blocks list is empty in /dev/sdb
root@openmediavault:~# mdadm --examine-badblocks /dev/sdc
Bad-blocks list is empty in /dev/sdc
root@openmediavault:~# mdadm --examine-badblocks /dev/sdd
Bad-blocks list is empty in /dev/sdd
root@openmediavault:~# mdadm --examine-badblocks /dev/sde
Bad-blocks on /dev/sde:
672813792 for 72 sectors
5603920904 for 48 sectors
5661217456 for 40 sectors
7032769024 for 88 sectors
7032769160 for 16 sectors
7032769248 for 8 sectors
7032769312 for 24 sectors
7064954000 for 56 sectors
7064954256 for 8 sectors

Alles anzeigen

Next Steps;
- Getting a new PCI Sata card today, will test with that - will rule out the HBA as a cause
- Will try swapping the existing sata cables around, although they have been there awhile without issue

- Will try other power rails of my PSU

Questions
- If I remove the newly added drive, will the array be able to recover some / all of my data? Is there a way to "reverse" the reshaping that has occured?
- Any other troubleshooting you might suggest, should I get a new motherboard? Or do you think its the false-bad-blocks as above?

geaves · 9. Mai 2024

Zitat von ks09aao

Two of my old drives show no smart errors in GUI, but they do show identical "bad blocks"

Have you been running scheduled smart tests on each drive, all mine are set to run a short self test at least once a month, this alone can give valuable information even if it doesn't show in the GUI

Zitat von ks09aao

again like that user did in the link above. I don't exactly follow what he did to fix his though, I think he froze reshaping, grew the array and continued reshaping

I've read through the link and I can't work it out either, but a grow command will reshape an array anyway I don't see how you can stop a reshape but initiate a grow.

Zitat von ks09aao

If I remove the newly added drive, will the array be able to recover some / all of my data?

I have been going over this mentally, TBH I don't, you could shutdown, remove the drive, restart the server and the array could come back as inactive with the 4 drives, it may then be possible to assemble the array and it may come back as clean/degraded

ks09aao · 9. Mai 2024

Zitat von geaves

Have you been running scheduled smart tests on each drive, all mine are set to run a short self test at least once a month, this alone can give valuable information even if it doesn't show in the GUI

I confess I haven't - not sure why I missed this feature in OMV. I've just been checking in the GUI now and then to make sure everything looks ok. I will for sure be enabling that once I'm up and running again.

I just tried all new sata cables, no joy. Will try the SATA PCI card, praying it's that - all my family photos etc were recently put on this server, and I had no idea my backup OMV server Rsync had broken. Gutted.

Side note- It might help me to understand what mdadm reshape is looping / stuck on exactly. But no errors in "syslog" in OMV.

If the "rebuild" is hanging due to those false-positive bad-blocks - do you think the below plan is a good / bad idea?

Inspect which 2 drives show as "bad blocks"

sudo mdadm --examine-badblocks /dev/sdx

Pause the reshape - I think?

sudo mdadm --wait/dev/md0

Tell mdadm to ignore those drives "bad blocks":

sudo mdadm --zero-superblock /dev/sda

sudo mdadm --zero-superblock /dev/sde

CONTINUE

sudo mdadm --continue/dev/md0

geaves · 9. Mai 2024

Zitat von ks09aao

Tell mdadm to ignore those drives "bad blocks":
sudo mdadm --zero-superblock /dev/sda
sudo mdadm --zero-superblock /dev/sde

You can't do that on two drives within the same array

Zitat von ks09aao

Inspect which 2 drives show as "bad blocks"
sudo mdadm --examine-badblocks /dev/sdx

In essence the bad blocks appear to have been replicated

Have a look at this I have heard of this option to use a no bad block

Whatever you do you cannot try, test, use an option on more than one drive within the array, I still think my last option in #14 might be worth a shot.

ks09aao · 9. Mai 2024

Updates:

- Tried PCI Sata card, no differences, tried both PCI ports.

- Just tried booting with the new HDD not attached - nothing in Software RAID OMV GUI.

- with 4 drives in, I do get the below from command line - not sure why it says raid 0 to raid 5.

What should I do now do you think - is this encouraging, or not because nothing appeared in the GUI

Code

sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Raid Level : raid0
Total Devices : 4
Persistence : Superblock is persistent


State : inactive
Working Devices : 4


Delta Devices : 1, (-1->0)
New Level : raid5
New Layout : left-symmetric
New Chunksize : 512K


Name : openmediavault.local:Data
UUID : f9ae3f3f:c15e0166:857cbfaf:00825035
Events : 1847447


Number   Major   Minor   RaidDevice


-       8       32        -        /dev/sdc
-       8        0        -        /dev/sda
-       8       48        -        /dev/sdd
-       8       16        -        /dev/sdb

Alles anzeigen

geaves · 9. Mai 2024

Zitat von ks09aao

What should I do now do you think - is this encouraging, or not because nothing appeared in the GUI

Hm, that's good, but line 9 tells you the problem, the array is inactive, so that's encouraging,

OK -> mdadm --stop /dev/md0 you should get mdadm stopped as the output

mdadm --assemble --force --verbose /dev/md0 /dev/sd[abcd]

ks09aao · 9. Mai 2024

it now appears as clean but degraded, feels like progress! would I now hit "recover" in the GUI?

geaves · 9. Mai 2024

Zitat von ks09aao

would I now hit "recover" in the GUI

NO!! your not recovering anything

The array is back in it's original state albeit without the new drive that's why it's in a clean degraded state, is the file system mounted?

Help request, Slow RAID 5 "Reshape"

ks09aao 8. Mai 2024

Jetzt mitmachen!

Tags