Monit "CPU wait usage" alerts on resume from sleep + resource limit matched fix

  • I'm in the process of migrating from unRAID to OMV + snapRAID + mhddfs and am trying to work out some minor problems. Anyone care to help?


    1) The first one is more of a validity check. I've been getting the following alerts every time when resuming from sleep (using the autoshutdown plugin). They occur almost instantly after resuming and the moniterd state returns to normal after 30 seconds. This behaviour is also talked about in this thread.


    Quote

    Resource limit matched Service localhost


    Date: Tue, 28 Apr 2015 15:27:30
    Action: alert
    Host: skeelo.lan
    Description: cpu wait usage of 100.0% matches resource limit [cpu wait usage>95.0%]


    Quote

    Resource limit succeeded Service localhost


    Date: Tue, 28 Apr 2015 15:28:01
    Action: alert
    Host: skeelo.lan
    Description: 'localhost' cpu wait usage check succeeded [current cpu wait usage=9.4%]


    Even though in the aforementioned thread they talk about cpu load, I believe this is a measure of the cpu waiting on I/O, more specifically in this case waiting for the disks to be spun up and ready. The suggested solution (i.e. adding OMV_MONIT_SERVICE_SYSTEM_CPUUSAGE_WAIT=99 to /etc/default/openmediavault and reconfiguring monit) does not work either since setting OMV_MONIT_SERVICE_SYSTEM_CPUUSAGE_WAIT=99 will still trigger the cpu wait usage, as per the following (note the adjusted value of 99%).


    Quote

    Resource limit matched Service localhost


    Date: Tue, 28 Apr 2015 20:36:29
    Action: alert
    Host: skeelo.lan
    Description: cpu wait usage of 100.0% matches resource limit [cpu wait usage>99.0%]


    Therefore I came up with the following:


    I noticed monit was called with a delay of 30 seconds upon boot, so I figured there must be a way to retrigger the delay on resume. Thus I created a script in /etc/pm/sleep.d named i.e. 99custom-monit (starting with "9" makes it run early after resume) which contains the following code:




    It simply stops all monitoring on sleep and restarts all monitoring 32 seconds after resume. My testing confirms I get no more alert messages. I only hope I'm not creating some other problem down the line, that's why I wanted to ask someone more knowledgeable about the validity of this workaround.


    2) Monit, kind as she is, also alerts me of the space usage above 80% of every file system. Problem is, I don't need this info from every filesystem. I know some of them are full and am fine with it. Is there a way to disable the monitoring of a few specific file systems while keeping others still monitored? I already tried editing /etc/monit/conf.d/openmediavault-filesystem.conf, commenting out the lines I did not need, running omv-mkconf monit and restarting monit but my changes don't seem persistent.


    3) I don't know if this has been suggested before but when removing shared folders through the webGUI, it asks to delete the content of the folders recursively. I really think this is rather dangerous because a single moment of carelessness can cause a lot of problems. I think it would be better to handle this with a separate button or with a checkbox (default unchecked) to create the same behaviour. This requires a well-thought-out action on behalf of the user, instead of mindlessly clicking a button underneath a bunch of words in an alert box which most people tend to ignore most of the time.

  • UPDATE: There is a much cleaner method posted in my next post.


    Ok, I whipped something up for #2 also. Thought I'd share, maybe someone finds it useful. Don't blame me if your monitoring stops working though.


    So, first make a copy in another directory of /usr/share/openmediavault/mkconf/monit.d/filesystem so you can revert back to the original when necessary (forgot that one myself of course :thumbsup: ).


    Then replace the part from "#Monitor mounted file systems." with this:



    Note the long numbers on line 15, replace these with your own values of the file systems to ignore, don't forget the asterisks before each number. Instead of looping through all mounted filesystems and writing a command to check the file systems, it replaces the commands of these specific file systems in monit's config file with a comment.


    execute


    omv-mkconf monit


    if you want, check out the resulting config file


    more /etc/monit/conf.d/openmediavault-filesystem.conf


    and then restart monit


    service monit restart


    Test and make a copy in another directory when working as expected because it might get overwritten when updating. Finally, enjoy your clean inbox!

  • I want to use this script to avoid the usage alert to turn off the computer instead sleep.
    I tried the above script that written by marz, but it didn't work.
    Errors still persisting.

    OMV v5.0
    Asus Z97-A/3.1; i3-4370
    32GB RAM Corsair Vengeance Pro

    Edited 5 times, last by tinh_x7 ().

  • @tinh_x7


    Sorry I didn't reply earlier, didn't get a notification because the setting was turned of by default and did not know. Now, my earlier post was for OMV 1.9. Could it be you are running 2.1?


    I recently upgraded to 2.1 and also noticed it didn't work anymore so here is the version for 2.1 (which is a much cleaner way that will probably work for 1.9 too).


    But first, backup the original to the root home folder:


    Code
    cd /usr/share/openmediavault/mkconf/monit.d
    cp filesystem ~/filesystem_2.1_ori


    Change the "#Monitor mounted filesystems." line in /usr/share/openmediavault/mkconf/monit.d/filesystem to:


    Code
    # Monitor mounted filesystems.
    xmlstarlet sel -t \
    -m "//system/fstab/mntent[not(contains(opts,'bind') or contains(opts,'loop') or contains(fsname,'your-fsname') or contains(fsname,'your-fsname'))]" \


    Change "your-fsname" with the ridiculous number you get in your mailbox or search your /media for the ones you want excluded. Add as many as needed.



    Make config:


    Code
    omv-mkconf monit


    if you want, check out the resulting config file, it should not include the filesystems specified.


    Code
    more /etc/monit/conf.d/openmediavault-filesystem.conf


    and then restart monit


    Code
    service monit restart


    Again, you could make a copy in another directory to avoid losing the changes when updating.

  • @marz,


    Yes, I'm currently running the latest version of OMV v2.1.6.
    Is your code mean to exclude the filesystem that give me the 'cpu wait usage' notification?
    When you said 'fsname', do you mean my UUID?


    I don't understand why OMV give me this error even though my cpu is an i3 with 16gb ram.
    It only happened when it resumes from hibernation/sleep/off.


    Thanks.

    OMV v5.0
    Asus Z97-A/3.1; i3-4370
    32GB RAM Corsair Vengeance Pro

  • No, the code I posted yesterday was for the "Resource limit matched Service fs_media_d70c9d42-7315-42d3-8e4b-9d16e1806b50" notification when one of your filesystems reaches > 80% usage. I use it to keep monitoring some filesystems but exclude some of which I know are full but don't want to receive a notification from everytime I wake/power up OMV.


    The "CPU wait usage" notification is the first block of code. That one still seems to work fine on 2.1.x but maybe change the script name to 99custom-monit (I edited my post) although it shouldn't make a difference. If you made the script as described above, check if it is execcutable with "ls -al", if not "chmod+x 99custom-monit" and retry.



    I don't understand why OMV give me this error even though my cpu is an i3 with 16gb ram.
    It only happened when it resumes from hibernation/sleep/off.


    First, I am not an expert on this but this is the way I see it:


    It takes some time for the filesystems to become available after wake up/resume because mechanical drives are much slower than the CPU. CPU wait usage means the time the CPU is waiting for I/O to complete and just sits there doing nothing. Because of this its wait usage is > 99%, it has nothing to do with how fast your CPU is. On normal boot up there is a 30 second delay before monit starts, my scripts emulates the same behaviour on resume from sleep or hibernation only.


    Also, monit checks in 30 seconds delays so only the first check after resuming is missed. You'll notice that the time between resource limit failed and succeeded always equals about 30 seconds.

  • I see. Your new code is for disk usage >80%.
    I"m looking for a solution that avoid the cpu wait usage is reached or local host resouce is reached.


    I've tried this code under: sleep.d and shutdown.d, but it's not working for me:
    In my case, I should put the script in shutdown dir, right?
    Maybe I'm missing 'chmod+x' command.


    OMV v5.0
    Asus Z97-A/3.1; i3-4370
    32GB RAM Corsair Vengeance Pro

  • No, for shutdown/power on events this notification should not appear by default. Put the script in sleep.d because it only checks suspend/resume events not shutdown/power on. I'm running 2.1.6 also and it seems to work fine.


    Do this and post your output please:


    Code
    cd /etc/pm/sleep.d
    ls -al
  • Looks OK. But I re-read your posts and you get these while booting from powering on? Are you sure it is not a resume from hibernation? I don't get those when shutting down and cold booting.


    You could just turn of the cpu usage notifications in the webGUI but then you'd lose the monitoring of course.

  • I'm using 'suspend' mode.
    I just added the script like you said in sleep.d, and made it's executable.
    Reboot the server.


    Edit: The script didn't work when it resumed from suspend mode.
    Maybe it's only work for sleep/hibernate mode.


    Code
    Description: cpu wait usage of 100.0% matches resource limit [cpu wait usage>98.0%]


    Code
    Description: 'localhost' cpu wait usage check succeeded [current cpu wait usage=0.5%]



    Here's my rtcwake code:


    Code
    rtcwake -m disk -s 61200

    OMV v5.0
    Asus Z97-A/3.1; i3-4370
    32GB RAM Corsair Vengeance Pro

    Edited 2 times, last by tinh_x7 ().

  • Hi there


    The Answesrs above don't solved my problems


    But this works for me


    Code
    cd /etc/monit/conf.d
    nano openmediavault-system.conf


    i just added "for 10 cycles" in the last row


    Code
    check system localhost
    if loadavg (1min) > 8 for 3 cycles then alert
    if loadavg (5min) > 4 for 3 cycles then alert
    if memory usage > 90% then alert
    if cpu usage (user) > 95% then alert
    if cpu usage (system) > 95% then alert
    if cpu usage (wait) > 95% for 10 cycles then alert


    To be honest I don't know if it is the right amount of cycles to wait, but it seems to work for me.


    No more mails are arriving after wake up.


    After the last update 31.01.16 the changes are gone I had to make it again



    Grüße Hellmuth


  • Just if anyone else is wondering: I can confirm this easy fix. It helped me too (Thanks Hellmuth!). I cut it down to 2 cycles and it seems to still be enough. No idea how long a cycle is...

  • @tinh_x7


    The reason the script in /etc/pm/sleep.d doesn't work for you is probably because you call the rtcwake command directly. This doesn't use pm-utils I believe and thus the script is not called on resuming. However, there is a way to use pm-utils with rtcwake so a wake up timer gets set when calling pm-suspend or pm-hibernate. The script also does not work when powering up/down and when using system.d (as per this link).


    @Hellmuth
    That would be the best solution IMHO if the number of cycles are configurable with an environment variable in /etc/default/openmediavault like there is for LOADAVG (OMV_MONIT_SERVICE_SYSTEM_LOADAVG_1MIN_CYCLES=3), that way it would survive openmediavault updates. Maybe a feature request?


    @dudester
    A cycle is 30 seconds (see /etc/monit/monitr). Since 30 seconds was not enough of a delay in my case (and apparently yours too), 2 cycles seems a good minimum value.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!