RAID 5 keeps resyncing

  • Hi All,


    I have a raid 5 array created from 4 drives each 3 TB capacity. It was working great for a while, but now it keeps resyncing:

    It takes roughly 6 hours for the resync to complete, which would be fine, but my problem is it eats up 100% of the CPU and 7.8 GB out of 8 RAM making my NAS completely unusable. Also this resync starts immediately after a reboot and seems to restart after its completed. I pretty noob with linux so I dont have a clue how to start troubleshooting.

    SMART shows no errors and generally good health for the drives.
    I took them out one by one, run some tests on my windows machine, but they seem fine, no performance or response time issues as far as I can see it.
    Although I doubt it has anything to do with it, but the issue started when the ISP changed my modem and had to completely reconfigure my network, assign the internal static IP's, open ports etc... Also OMV had an update the same day I installed from the webGUI as usual, I didnt see any errors.

    I tried to freeze or throttle the sync, but doesnt seem to make a difference. I tried to stop it with:

    Code
    echo "idle" >  /sys/block/md0/md/sync_action

    But it either doesnt work or restarts immediately


    As an alternative approach I tried to throttle it with :


    Code
    sudo sysctl -w dev.raid.speed_limit_max=1000

    Which seemingly works, it shows the speed going down, but the CPU and RAM utilization stays the same so it doesnt really fixes the problem.

    I have a brief 2-3 minutes after a restart where I can still access my array. All files are there and everything seems to be intact, but when the RAM gets full SMB keeps timing out and the array becomes unusable.

    • Official Post

    The continual resyncing would suggest a hardware issue rather than a drive issue, this could be cabling, controller, power supply


    Basically the drives are unable to stay in sync with each other

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 6x amd64 running on an HP N54L Microserver

  • I never thought about that... Cabling is easy enough to change, I even have a spare PSU, but I dont really have a way to change the MOBO...

    Do you think putting in a PCI SATA card would be a sufficient test to see if the controller failed?

    • Official Post

    Do you think putting in a PCI SATA card would be a sufficient test to see if the controller failed

    Possibly I'll tag chente as he has been doing some investigations into PCI Sata cards and their chipsets.


    The problem you have is, it's going to be very hit and miss, as trial and error requires patience and notes in the process used.

    • Official Post

    but I dont really have a way to change the MOBO

    Are you using the SATA ports on the motherboard? If that is the case you could do what you say, a PCIe to SATA adapter could tell you if the problem goes away and if it is the board that has failed. What motherboard is it?

    If you have to buy an adapter the best option is an LSI, but if you do not have an x8 or x4 port available and you need an adapter with a smaller port read what I said in that post about these adapters. There are links with more information. Why I chose an N100 over a Raspberry PI5

  • Thanks for all the tips guys! I'm not quite sure what happened but it seems to be a combination of things.

    So I was in the process of putting in the PCI SATA card, but I needed a file urgently from the drive not in the RAID array. So I disconnected the 4 RAID drives, turned on my NAS.
    Weirdly enough the CPU utilization was still 100%, so it started bugging me.... I checked what is exactly melting down my CPU and it was this error:

    task CrashUploader:2872 blocked for more than 120 seconds
    The NAS was still painfully slow, but actually usable, not like when the RAID was also rebuilding.


    I still not quite sure what this is, but it looks Plex server related. So I stopped the Plex container in Portainer and it helped. Plugged back the drives and the resync completed without issues.

    So it looks like some update or crash hit during the resync and it borked up the system throwing it in a loop of some sorts.

    I re-deployed Plex in portainer from scratch and everything seems to be working now.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!