RAID 5 keeps resyncing

nemlehet4 · 30. November 2023

Hi All,

I have a raid 5 array created from 4 drives each 3 TB capacity. It was working great for a while, but now it keeps resyncing:

Code

ersion : 1.2
     Creation Time : Thu Jun 16 17:56:25 2022
        Raid Level : raid5
        Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
     Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Thu Nov 30 10:43:17 2023
             State : clean, resyncing
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : resync

     Resync Status : 45% complete

              Name : openmediavault:0
              UUID : 818c89e6:2533ab78:cdb133a6:b682d6dd
            Events : 134707

    Number   Major   Minor   RaidDevice State
       0       8       64        0      active sync   /dev/sde
       1       8       80        1      active sync   /dev/sdf
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd

Alles anzeigen

It takes roughly 6 hours for the resync to complete, which would be fine, but my problem is it eats up 100% of the CPU and 7.8 GB out of 8 RAM making my NAS completely unusable. Also this resync starts immediately after a reboot and seems to restart after its completed. I pretty noob with linux so I dont have a clue how to start troubleshooting.

SMART shows no errors and generally good health for the drives.
I took them out one by one, run some tests on my windows machine, but they seem fine, no performance or response time issues as far as I can see it.
Although I doubt it has anything to do with it, but the issue started when the ISP changed my modem and had to completely reconfigure my network, assign the internal static IP's, open ports etc... Also OMV had an update the same day I installed from the webGUI as usual, I didnt see any errors.

I tried to freeze or throttle the sync, but doesnt seem to make a difference. I tried to stop it with:

Code

echo "idle" >  /sys/block/md0/md/sync_action

But it either doesnt work or restarts immediately

As an alternative approach I tried to throttle it with :

Code

sudo sysctl -w dev.raid.speed_limit_max=1000

Which seemingly works, it shows the speed going down, but the CPU and RAM utilization stays the same so it doesnt really fixes the problem.

I have a brief 2-3 minutes after a restart where I can still access my array. All files are there and everything seems to be intact, but when the RAM gets full SMB keeps timing out and the array becomes unusable.

geaves · 30. November 2023

The continual resyncing would suggest a hardware issue rather than a drive issue, this could be cabling, controller, power supply

Basically the drives are unable to stay in sync with each other

nemlehet4 · 30. November 2023

I never thought about that... Cabling is easy enough to change, I even have a spare PSU, but I dont really have a way to change the MOBO...

Do you think putting in a PCI SATA card would be a sufficient test to see if the controller failed?

geaves · 30. November 2023

Zitat von nemlehet4

Do you think putting in a PCI SATA card would be a sufficient test to see if the controller failed

Possibly I'll tag chente as he has been doing some investigations into PCI Sata cards and their chipsets.

The problem you have is, it's going to be very hit and miss, as trial and error requires patience and notes in the process used.

chente · 30. November 2023

Zitat von nemlehet4

but I dont really have a way to change the MOBO

Are you using the SATA ports on the motherboard? If that is the case you could do what you say, a PCIe to SATA adapter could tell you if the problem goes away and if it is the board that has failed. What motherboard is it?

If you have to buy an adapter the best option is an LSI, but if you do not have an x8 or x4 port available and you need an adapter with a smaller port read what I said in that post about these adapters. There are links with more information. Why I chose an N100 over a Raspberry PI5

nemlehet4 · 1. Dezember 2023

Thanks for all the tips guys! I'm not quite sure what happened but it seems to be a combination of things.

So I was in the process of putting in the PCI SATA card, but I needed a file urgently from the drive not in the RAID array. So I disconnected the 4 RAID drives, turned on my NAS.
Weirdly enough the CPU utilization was still 100%, so it started bugging me.... I checked what is exactly melting down my CPU and it was this error:

task CrashUploader:2872 blocked for more than 120 seconds
The NAS was still painfully slow, but actually usable, not like when the RAID was also rebuilding.

I still not quite sure what this is, but it looks Plex server related. So I stopped the Plex container in Portainer and it helped. Plugged back the drives and the resync completed without issues.

So it looks like some update or crash hit during the resync and it borked up the system throwing it in a loop of some sorts.

I re-deployed Plex in portainer from scratch and everything seems to be working now.

RAID 5 keeps resyncing

Jetzt mitmachen!

Tags