Great to see you have similar issues. Not great, but good to know there's someone else
Since my system is currently clogged, I started investigating again..
No high cpu usage / processes taking ram or cpu. BUT CPU waiting is high with around 24
Now after 10min being clogged (docker containers are not processing) I recognized my light going off (which is controlled by my smart home..) So the system is "free" again, load drops immediatly. I've done nothing but opening top or iotop..
top (clogged)
top - 22:19:09 up 13 days, 1:44, 2 users, load average: 21,82, 18,59, 12,38
Tasks: 254 total, 1 running, 204 sleeping, 0 stopped, 1 zombie
%Cpu(s): 2,4 us, 1,2 sy, 0,0 ni, 71,5 id, 24,6 wa, 0,0 hi, 0,2 si, 0,0 st
KiB Mem : 7643708 total, 3573692 free, 1095592 used, 2974424 buff/cache
KiB Swap: 7849980 total, 7440556 free, 409424 used. 6162440 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25186 root 20 0 51940 15924 6508 S 8,9 0,2 0:21.98 /usr/bin/python3 /usr/sbin/iotop -oPa -d 2
25548 root 20 0 46628 3908 3208 R 1,3 0,1 0:00.23 top
1044 root 20 0 971472 2196 1372 S 0,7 0,0 42:33.04 /usr/sbin/collectd
20760 root 20 0 16376 3312 1920 S 0,7 0,0 23:40.97 docker-gen -watch -notify nginx -s reload /app/nginx.tmpl /etc/nginx/conf.d/default.conf
21073 kris 20 0 30912 8136 684 S 0,7 0,1 13:03.13 redis-server
21111 root 20 0 17192 5804 4240 S 0,7 0,1 19:12.24 docker-gen -watch -notify /app/signal_le_service -wait 15s:60s /app/letsencrypt_service+
10 root 20 0 0 0 0 I 0,3 0,0 28:59.16 [rcu_sched]
3189 mysql 20 0 2792216 271180 9148 S 0,3 3,5 543:14.88 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/pl+
3365 systemd+ 20 0 9420 1000 820 S 0,3 0,0 22:33.65 /usr/sbin/mosquitto -c /mosquitto/config/mosquitto.conf
31351 root 20 0 10744 1284 804 S 0,3 0,0 6:24.22 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.l+
31436 kris 20 0 1266008 65756 8864 S 0,3 0,9 33:59.11 node-red
31614 root 20 0 0 0 0 S 0,3 0,0 21:31.91 [kdvb-ad-1-fe-0]
31616 root 20 0 0 0 0 S 0,3 0,0 21:34.00 [kdvb-ad-0-fe-0]
1 root 20 0 205144 5948 4096 S 0,0 0,1 4:15.22 /sbin/init
2 root 20 0 0 0 0 S 0,0 0,0 0:00.86 [kthreadd]
3 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [rcu_gp]
4 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [rcu_par_gp]
6 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [kworker/0:0H-kb]
8 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [mm_percpu_wq]
9 root 20 0 0 0 0 S 0,0 0,0 2:33.62 [ksoftirqd/0]
11 root 20 0 0 0 0 I 0,0 0,0 0:00.00 [rcu_bh]
12 root rt 0 0 0 0 S 0,0 0,0 0:01.94 [migration/0]
13 root rt 0 0 0 0 S 0,0 0,0 0:09.35 [watchdog/0]
14 root 20 0 0 0 0 S 0,0 0,0 0:00.00 [cpuhp/0]
15 root 20 0 0 0 0 S 0,0 0,0 0:00.00 [cpuhp/1]
16 root rt 0 0 0 0 S 0,0 0,0 0:08.65 [watchdog/1]
....
Alles anzeigen
Healthy top:
top - 22:23:29 up 13 days, 1:48, 2 users, load average: 0,91, 9,94, 10,51
Tasks: 234 total, 1 running, 184 sleeping, 0 stopped, 1 zombie
%Cpu(s): 1,5 us, 1,0 sy, 0,0 ni, 97,4 id, 0,1 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem : 7643708 total, 3621904 free, 1072232 used, 2949572 buff/cache
KiB Swap: 7849980 total, 7440556 free, 409424 used. 6211944 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25186 root 20 0 51940 15948 6508 S 3,9 0,2 0:38.21 /usr/bin/python3 /usr/sbin/iotop -oPa -d 2
26148 root 20 0 46628 3856 3100 R 1,6 0,1 0:00.18 top
27181 root 20 0 1373624 177056 8588 S 1,3 2,3 76:33.29 /usr/bin/node /opt/pimatic-docker/node_modules/pimatic/pimatic.js restart
20760 root 20 0 16376 3576 1948 S 0,7 0,0 23:43.07 docker-gen -watch -notify nginx -s reload /app/nginx.tmpl /etc/nginx/conf.d/default.conf
21111 root 20 0 17192 5804 4240 S 0,7 0,1 19:13.83 docker-gen -watch -notify /app/signal_le_service -wait 15s:60s /app/letsencrypt_service+
10 root 20 0 0 0 0 I 0,3 0,0 29:00.03 [rcu_sched]
52 root 20 0 0 0 0 I 0,3 0,0 7:31.98 [kworker/3:1-eve]
883 root 20 0 1336100 41188 12124 S 0,3 0,5 51:00.64 /usr/bin/dockerd -H unix:///var/run/docker.sock
1044 root 20 0 971472 2196 1372 S 0,3 0,0 42:33.81 /usr/sbin/collectd
2683 root 20 0 327268 3324 2844 S 0,3 0,0 2:21.47 omv-engined
3189 mysql 20 0 2792216 271180 9148 S 0,3 3,5 543:15.55 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/pl+
8503 root 20 0 70688 7700 7492 S 0,3 0,1 1:26.15 /lib/systemd/systemd-journald
17613 root 20 0 254244 2292 1932 S 0,3 0,0 0:35.65 /usr/sbin/rsyslogd -n
21073 kris 20 0 30912 8136 684 S 0,3 0,1 13:04.31 redis-server
31585 kris 20 0 730952 20300 184 S 0,3 0,3 159:26.74 /usr/bin/tvheadend -C -c /config
31614 root 20 0 0 0 0 S 0,3 0,0 21:32.35 [kdvb-ad-1-fe-0]
31616 root 20 0 0 0 0 S 0,3 0,0 21:34.50 [kdvb-ad-0-fe-0]
1 root 20 0 205144 5948 4096 S 0,0 0,1 4:15.26 /sbin/init
2 root 20 0 0 0 0 S 0,0 0,0 0:00.86 [kthreadd]
3 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [rcu_gp]
4 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [rcu_par_gp]
6 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [kworker/0:0H-kb]
8 root 0 -20 0 0 0 I 0,0 0,0 0:00.00 [mm_percpu_wq]
9 root 20 0 0 0 0 S 0,0 0,0 2:33.63 [ksoftirqd/0]
....
Alles anzeigen
How can the waiting be investigated further? I check iotop (nothing special, not much writing..). I'll let iostat 120 run overnight, to see if there is something useful in it..
UPDATE:
I ran this while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 30; done overnight (no clogging this night..)
and get some locking processes (rsync backup job) for about 15min, but no message about high load this time, so seems uncritical.
iostat 120 was running and is showing some output like this: https://pastebin.com/g85TLKgj