High System Load, how to identify and resolve

  • I've installed "Armbian buster with Linux 5.4.49-rockchip64" on my nanopim4v2, straight to SDCard. I ran apt update and upgrade. Then used the install script for ARM devices and everything went fine.

    I've been able to login to the webUI fine. I created 1 user. Then I created 2 shares. One temporary test share for SMB. One for Docker and any containers. I set the path for Docker. Then installed Docker and Portainer all using the WebUI. All working great.


    Now to problem. I simply leave the system on (doing nothing) over night and next morning a see the 'heartbeat' light flashing away. I wondered why so SSH in and ah... high system load, but low CPU. I don't know how to tackle this but I've tried to include some info below. Let me know what else you might need in order to suggest something. Thanks, any help appreciated.


    BTW this is a totally clean install. I've wiped the SDCard, followed ARM install guide and the result is the same, it's fine initially but after some times (hours) the system load ramps up and up.


    Code
    root@nanopim4v2:~# uptime
     18:56:38 up 2 days, 12:03,  1 user,  load average: 689.18, 685.17, 674.69
    Code
    root@nanopim4v2:~# ps -e v
      PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
        1 ?        Ss     1:24     86  1149 165654 10136  0.2 /sbin/init
        2 ?        S      0:00      0     0     0     0  0.0 [kthreadd]
        3 ?        I<     0:00      0     0     0     0  0.0 [rcu_gp]
        4 ?        I<     0:00      0     0     0     0  0.0 [rcu_par_gp]
        8 ?        I<     0:00      0     0     0     0  0.0 [mm_percpu_wq]
        9 ?        S      0:01      0     0     0     0  0.0 [ksoftirqd/0]

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

  • Hm... syslog has lots of nginx entries stopping and restarting?? And I cannot connect via WebGUI any more, just SSH. Reboot fixes until it all happens again.

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

  • Take a deeper look at this thread: 502 Bad Gateway nginx


    Greetings

    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • Take a deeper look at this thread: 502 Bad Gateway nginx


    Greetings

    David

    Appreciate the feedback, thanks. I've now read through the suggested thread. I don't see how it directly applies tbh? I've tried some of the commands to see if packages were missing but seems fine. If I reboot everything is working fine, I can access the WebUI, change settings etc... But after what appears to be a random undefined time period the system starts getting the high system loads.


    Currently, after last reboot it's still running smoothly.

    Code
    root@npim4v2:~# uptime
     14:56:25 up 18:48,  1 user,  load average: 0.00, 0.06, 0.07


    I expect however that within the next 12 hours he system load will start to increase and then just runaway. Just need to know where I should look to to try and find out the cause. Then obviously try to resolve.

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

  • . I don't see how it directly applies tbh?

    It looked to be similiar to your logs, thus I suggested the look at it.



    But after what appears to be a random undefined time period the system starts getting the high system loads.

    Do you by any chance see what is using the ressources in top or htop?


    Greetings

    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • It looked to be similiar to your logs, thus I suggested the look at it.

    Sorry, I totally appreciated the direction. While the error in syslog was the same, I was able to access the WebUI etc.. just with a reboot until the next runaway system load increase. The problems, to me, seem unconnected with different symptoms. Although certainly it was worth me looking into, which I did. What I should have said was, my conclusion after reading that thread was that I didn't think it applied to this problem.


    Do you by any chance see what is using the ressources in top or htop?

    When I had the high system load you can see the output of top in my first post above. I don't see anything.

    Load averages were all in the 600's (nearly 700's) and the CPU usage was mostly ideal. Biggest use was from top. Although interestingly my system is currently running okay and top only takes 1% of CPU compared to the 6% seen when load was abnormal.


    I could not run htop while system load was abnormal, SSH just sat with a blinking cursor until I hit Ctl+C. htop runs okay when the system is behaving.

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

    Einmal editiert, zuletzt von no-clu ()

  • Okay... this morning I'm seeing the high load again here are is the output from uptime, top and syslog. I cannot run htop.

    2 days of running fine, similar timeframe to last one.


    Bash
    root@npim4v2:~# uptime
     07:00:47 up 2 days, 10:53,  1 user,  load average: 52.26, 48.29, 38.36

    and 15 mins later load averages increasing...

    Bash
    root@npim4v2:~# uptime
     07:17:57 up 2 days, 11:10,  1 user,  load average: 70.37, 66.41, 55.81


    Cannot see anything hogging with top.


    In the last few minutes of the log below something starts to go wrong, but i don't know what. About 05:49 things seem to start going pear shaped followed shortly by the nginx issues.

    --syslog is attached--

    syslog.txt

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

  • According to some similiar questions on serverfault.com a user suggested that the load average does not need to be linked to explicit load of the CPU but requests to get a CPU timeslot. Even File System waits seem to be able to cause such issues. To be honest, until now I did not know that either of such things can cause a high load average. Yours also seems extremely high.


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • According to some similiar questions on serverfault.com a user suggested that the load average does not need to be linked to explicit load of the CPU but requests to get a CPU timeslot. Even File System waits seem to be able to cause such issues. To be honest, until now I did not know that either of such things can cause a high load average. Yours also seems extremely high.


    Greetings
    David

    Thanks davidh2k an interesting read and led me to a few useful places. I'd seen that post but dismissed it due to the OS. But on second look there were some linked posts which were very useful. Thanks


    My action plan is as follows.

    1. Wipe SDCard and reinstall armbian buster for nanopom4v2, leave this running a few days to see if load stays normal or problem persists.

    2. If step 1. passes. Install OMV using script for ARM installs.

    3. If/When system loads increase use some commands from these sources to try and gain further insight to cause

    --a. serverfault.com

    --b. brendangregg.com


    I'll post back after step 3. Although I'm happy to read any other thoughts in the meantime. Thanks.

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

    • Offizieller Beitrag

    I was having the runaway heartbeat some time ago with a version-one M4 and finally figured I had scheduled a remote Rsync to my backup server which coincided with Plex running it’s scheduled maintenance. I moved the schedule a couple hours later and the problem went away.

    System Backup Typo alert: Under the Linux section the command should be sudo umount /dev/sda1 NOT sudo unmount /dev/sda1

    Backup Data Disk to Backup Disk on Same Machine: In a Scheduled Job:rsync -av --delete /srv/dev-disk-by-uuid-f8814ed9-9a5c-4e1c-8830-426968c20ea3/ /srv/dev-disk-by-uuid-e67439d5-00a3-4942-bd5f-b84ab86aa850/ Don't forget trailing slashes, and BE CAREFUL. (HT: Getting Started with OMV5)

    Equipment - Thinkserver TS140, NanoPi M4 (v.1), Odroid XU4 (Using DietPi): PiHole

  • I was having the runaway heartbeat some time ago with a version-one M4 and finally figured I had scheduled a remote Rsync to my backup server which coincided with Plex running it’s scheduled maintenance. I moved the schedule a couple hours later and the problem went away.

    I missed this reply. Thanks. This was on a fresh install, all I'd done was, install Docker and Portainer (source on dedicated spinning disk HDD) and created a shared folder on a second spinning HDD and setup SMB share for same folder.


    I've had armbian running for over 1 week with no problems. I even filled the SDCard 100% to see if it causes an problems. All runnign smoothly regarding system loads. Stage two is install OMV using install script for ARM devices, leave as is, i.e. totally fresh, excpet change admin password and wait for the system loads to go crazy, then investigate further as not an armbian issue.


    Thanks, will update again as I discover and require assistance :)


    Here is the output from Login, Uptime and df

    nanopi m4v2 w/ 4xSATA HAT, 4GB SDCard Boot + Root (testing), 320GB HDD OMVdata, 2x 1TB HDD Data

    OMV 5.x

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!