Upgraded CPU, load average monitoring parameters not updated

cwlucas41 · 16. Januar 2024

I recently updated the CPU on my Precision Tower 5810 from a Xeon E5-1603 v3 (4 cores, no HT) to a E5-2690 v4 (14 cores + HT). It's amazing what you can get for $25 on ebay.

The upgrade went great and it solved a performance issue I was having. Before upgrading, I was occasionally receiving loadavg (1min) > 8 and loadavg (5min) > 4 monitoring alerts.

However, after upgrading I still occasionally receive loadavg monitoring alerts. The puzzling part is that the alert thresholds are still loadavg (1min) > 8 and loadavg (5min) > 4 while I would expect them to be loadavg (1min) > 56 and loadavg (5min) > 28 due to /proc/cpuinfo showing 28 processor units.

The monitoring code (link) is:

Code

    if loadavg (1min) > {{ grains['num_cpus'] * loadavg_1min_mult | float(1.0) | round(1) }} for {{ loadavg_1min_cycles }} cycles then alert
    if loadavg (5min) > {{ grains['num_cpus'] * loadavg_5min_mult | float(1.0) | round(1) }} for {{ loadavg_5min_cycles }} cycles then alert

I suspect what is happening is that grains['num_cpus'] is cached.

I was wondering if I can invoke this code (link) to update the grains info, but I could not find a way to execute it after reading the developer section of the website and searching on the forum.

Code

refresh_grains:
  module.run:
    - saltutil.refresh_grains:
      - refresh_pillar: True

This obviously isn't critical, but my understanding of the code is that the stale 'num_cpus' info is causing false positive monitoring alerts for me.

Any advise on how to correct this appreciated!

votdev · 16. Januar 2024

You can try running

Bash

# omv-salt stage run prepare
# omv-salt deploy run monit

cwlucas41 · 16. Januar 2024

Yup, that fixed it. Thanks!

Upgraded CPU, load average monitoring parameters not updated

cwlucas41 16. Januar 2024

Jetzt mitmachen!

Tags