OK, I declared victory too fast. I now learned that the locked memory setting isn't the cause for rclone being killed (because rclone doesn't use locked memory). So, if you have any ideas, why rclone on OMV is being killed when it reaches about 4GB of memory usage, please let me know. (For more on this issue, see this thread in the rclone forum.
Beiträge von tophee
-
-
I am running rclone on my OMV to serve a cloud storage via SFTP and I have been struggling for months with rclone repeatedly being killed and I have been trying to trouble shoot this over on the rclone forum. We figured it had to do with memory usage, but since I have 24 GB of RAM in my OMV server, there isn't really any shortage of memory. So why does it get killed?
It looks like we've finally identified the reason:
Code
Alles anzeigen$ ulimit -a real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 95063 max locked memory (kbytes, -l) 3049767 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 95063 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
rclone is bumping into the max memory size. Even root has this limitation of around 2.9 GB.
So I'm wondering: why is OMV limiting memory usage in this way? And, more importantly: how do I change it?
-
I figured it out: I had it running as a user service, so I was able to stop it using systemctl --user stop rclone@pcloud.service...
I had triple-checked that there was no system service doing the restarting, but I had forgotten about user services.
So, what I don't quite understand is how the user service interacts with the scheduled task...
-
Here is what I found in the /var/log/syslog:
CodeFeb 28 18:49:07 server systemd[1275]: rclone@pcloud.service: start operation timed out. Terminating. Feb 28 18:49:07 server systemd[1275]: rclone@pcloud.service: Failed with result 'timeout'. Feb 28 18:49:07 server systemd[1275]: Failed to start rclone: make sure pcloud is served via sftp. Feb 28 18:49:09 server monit[1187]: 'server' mem usage of 94.6% matches resource limit [mem usage > 90.0%] Feb 28 18:49:10 server smbd[6312]: [2023/02/28 18:49:10.009073, 2] ../../source3/smbd/dosmode.c:137(unix_mode) Feb 28 18:49:10 server smbd[6312]: unix_mode(.) inherit mode 42770 Feb 28 18:49:13 server systemd[1275]: rclone@pcloud.service: Scheduled restart job, restart counter is at 578. Feb 28 18:49:13 server systemd[1275]: Stopped rclone: make sure pcloud is served via sftp. Feb 28 18:49:13 server systemd[1275]: Starting rclone: make sure pcloud is served via sftp...
-
Another thing that puzzles me is that the command shown in ps aux is not identical with the one I entered in the Scheduled Tasks UI, which is
export GOGC=50 && rclone serve sftp pcloud:Backup/ --addr :2022 --user ******* --pass *********** --log-file=/zfs/NAS/config/rclone/rclone.log --vfs-cache-mode writes --rc &
Notably, the password section is missing. Maybe ps just truncates long commands, I don't know, but even so, it also added a whole section: --config=/zfs/NAS/config/homedirs/christoph/.config/rclone/rclone.conf, which is weird. I have no idea where that comes from.
-
I have not created any cron job manually, but I have manually started the scheduled task shown in the OP. Does manually running a scheduled task have any particular side effects (such as the task becoming unstoppable?)
I tried killing cron/anacron but rclone keeps coming back:
Code$ sudo killall anacron anacron: no process found $ sudo killall cron $ ps aux | grep rclone christo+ 186545 0.2 0.1 761996 33252 ? Ssl 17:55 0:00 /usr/bin/rclone serve sftp pcloud:Backup/ --config=/zfs/NAS/config/homedirs/christoph/.config/rclone/rclone.conf --addr :2022 --vfs-cache-mode minimal --log-level INFO --log-file /zfs/NAS/config/rclone/rclone-pcloud.log --user christoph $ killall rclone $ ps aux | grep rclone christo+ 187792 1.0 0.0 759820 21756 ? Dsl 17:56 0:00 /usr/bin/rclone serve sftp pcloud:Backup/ --config=/zfs/NAS/config/homedirs/christoph/.config/rclone/rclone.conf --addr :2022 --vfs-cache-mode minimal --log-level INFO --log-file /zfs/NAS/config/rclone/rclone-pcloud.log --user christoph christo+ 187815 0.0 0.0 6216 624 pts/2 S+ 17:56 0:00 grep rclone
Based on that there not being any anacron process, I assume that it can't be responsible for restarting rclone, right? And since killing cron didn't stop the madness, cron is not responsible either. So what is?
-
So how do I stop the rclone process from being restarted?
-
I have scheduled rclone to run at reboot via the OMV UI like this:
Now I'm trying to stop the process and it is impossible. Whenever I dokillall rclone I can see that the process is gone via ps aux | grep rclone but a second or two rclone is back. I don't understand what is going on.
I saw in the documentation that the UI doesn't write directly to the crontab but in essence, I would still expect that adding a command to the scheduler will do the same as adding the same command into crontab, i.e. if it is scheduled to be executed at reboot, it will execute at reboot and never again until the next reboot.
OMV is clearly not doing that and I fail to understand what it is doing or how I can prevent it from doing so. I have even disabled the scheduled task in the UI but rclone is still being restarted.
Could anyone explain?
-
Yesterday I upgraded to openmediavault 6.3.1-1. No problem.
Today, from one second to the next, the server lost connectivity so I eventually had to restart it. Once it was back, I found this in `/var/logs/message`:
Code
Alles anzeigen2628 Feb 25 08:15:44 server openmediavault-update-smart-drivedb: Updating smartmontools drive database ... 2629 Feb 25 11:30:21 server kernel: [51615.748213] ------------[ cut here ]------------ 2630 Feb 25 11:30:21 server kernel: [51615.748216] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out 2631 Feb 25 11:30:21 server kernel: [51615.748230] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280 2632 Feb 25 11:30:21 server kernel: [51615.748236] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel tcp_diag udp_> 2633 Feb 25 11:30:21 server kernel: [51615.748292] sysfillrect usbserial sysimgblt parport_pc parport mei_me mei intel_pch_thermal mac_hid acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znv> 2634 Feb 25 11:30:21 server kernel: [51615.748335] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O 5.15.85-1-pve #1 2635 Feb 25 11:30:21 server kernel: [51615.748338] Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.14.0 for D3417-B2x 10/24/2017 2636 Feb 25 11:30:21 server kernel: [51615.748339] RIP: 0010:dev_watchdog+0x277/0x280 2637 Feb 25 11:30:21 server kernel: [51615.748343] Code: eb 97 48 8b 5d d0 c6 05 26 82 4d 01 01 48 89 df e8 5e 53 f9 ff 44 89 e1 48 89 de 48 c7 c7 f0 a9 8a b7 48 89 c2 e8 cd 99 1c 00 <0f> 0b eb 80 e9 45 ca 25 00 0f 1f 4> 2638 Feb 25 11:30:21 server kernel: [51615.748345] RSP: 0018:ffffc28740114e70 EFLAGS: 00010282 2639 Feb 25 11:30:21 server kernel: [51615.748347] RAX: 0000000000000000 RBX: ffffa0f20d4e0000 RCX: ffffa0f746520588 2640 Feb 25 11:30:21 server kernel: [51615.748349] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa0f746520580 2641 Feb 25 11:30:21 server kernel: [51615.748350] RBP: ffffc28740114ea8 R08: 0000000000000003 R09: 0000000000000001 2642 Feb 25 11:30:21 server kernel: [51615.748352] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000 2643 Feb 25 11:30:21 server kernel: [51615.748353] R13: ffffa0f20174b680 R14: 0000000000000001 R15: ffffa0f20d4e04c0 2644 Feb 25 11:30:21 server kernel: [51615.748355] FS: 0000000000000000(0000) GS:ffffa0f746500000(0000) knlGS:0000000000000000 2645 Feb 25 11:30:21 server kernel: [51615.748357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2646 Feb 25 11:30:21 server kernel: [51615.748358] CR2: 00007f575572f024 CR3: 00000004ed410004 CR4: 00000000003726e0 2647 Feb 25 11:30:21 server kernel: [51615.748360] Call Trace: 2648 Feb 25 11:30:21 server kernel: [51615.748361] <IRQ> 2649 Feb 25 11:30:21 server kernel: [51615.748364] ? pfifo_fast_enqueue+0x160/0x160 2650 Feb 25 11:30:21 server kernel: [51615.748367] call_timer_fn+0x29/0x120 2651 Feb 25 11:30:21 server kernel: [51615.748372] __run_timers.part.0+0x1e1/0x270 2652 Feb 25 11:30:21 server kernel: [51615.748374] ? ktime_get+0x43/0xc0 2653 Feb 25 11:30:21 server kernel: [51615.748376] ? lapic_next_deadline+0x2c/0x40 2654 Feb 25 11:30:21 server kernel: [51615.748379] ? clockevents_program_event+0xa8/0x130 2655 Feb 25 11:30:21 server kernel: [51615.748382] run_timer_softirq+0x2a/0x60 2656 Feb 25 11:30:21 server kernel: [51615.748384] __do_softirq+0xd6/0x2ea 2657 Feb 25 11:30:21 server kernel: [51615.748388] irq_exit_rcu+0x94/0xc0 2658 Feb 25 11:30:21 server kernel: [51615.748390] sysvec_apic_timer_interrupt+0x80/0x90 2659 Feb 25 11:30:21 server kernel: [51615.748393] </IRQ> 2660 Feb 25 11:30:21 server kernel: [51615.748394] <TASK> 2661 Feb 25 11:30:21 server kernel: [51615.748395] asm_sysvec_apic_timer_interrupt+0x1b/0x20 2662 Feb 25 11:30:21 server kernel: [51615.748397] RIP: 0010:cpuidle_enter_state+0xd9/0x620 2663 Feb 25 11:30:21 server kernel: [51615.748401] Code: 3d 44 3f 3f 49 e8 77 ff 6d ff 49 89 c7 0f 1f 44 00 00 31 ff e8 b8 0c 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 e> 2664 Feb 25 11:30:21 server kernel: [51615.748403] RSP: 0018:ffffc287400d7e38 EFLAGS: 00000246 2665 Feb 25 11:30:21 server kernel: [51615.748404] RAX: ffffa0f746530bc0 RBX: ffffe2873fd00000 RCX: 0000000000000000 2666 Feb 25 11:30:21 server kernel: [51615.748406] RDX: 0000000000001782 RSI: 000000002c13dcf6 RDI: 0000000000000000 2667 Feb 25 11:30:21 server kernel: [51615.748407] RBP: ffffc287400d7e88 R08: 00002ef1ba561902 R09: 00000000000c3500 2668 Feb 25 11:30:21 server kernel: [51615.748408] R10: 0000000000000007 R11: 071c71c71c71c71c R12: ffffffffb80d4420 2669 Feb 25 11:30:21 server kernel: [51615.748410] R13: 0000000000000006 R14: 0000000000000006 R15: 00002ef1ba561902 2670 Feb 25 11:30:21 server kernel: [51615.748413] ? cpuidle_enter_state+0xc8/0x620 2671 Feb 25 11:30:21 server kernel: [51615.748417] cpuidle_enter+0x2e/0x50 2672 Feb 25 11:30:21 server kernel: [51615.748420] do_idle+0x20d/0x2b0 2673 Feb 25 11:30:21 server kernel: [51615.748423] cpu_startup_entry+0x20/0x30 2674 Feb 25 11:30:21 server kernel: [51615.748425] start_secondary+0x12a/0x180 2675 Feb 25 11:30:21 server kernel: [51615.748429] secondary_startup_64_no_verify+0xc2/0xcb 2676 Feb 25 11:30:21 server kernel: [51615.748433] </TASK> 2677 Feb 25 11:30:21 server kernel: [51615.748434] ---[ end trace 92789cc296e4fa0f ]--- 2678 Feb 25 11:30:21 server kernel: [51615.839286] br0: port 1(enp0s31f6) entered disabled state 2679 Feb 25 11:30:24 server kernel: [51619.455212] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None 2680 Feb 25 11:30:24 server kernel: [51619.455320] br0: port 1(enp0s31f6) entered blocking state 2681 Feb 25 11:30:24 server kernel: [51619.455324] br0: port 1(enp0s31f6) entered listening state 2682 Feb 25 11:30:29 server kernel: [51623.684311] br0: port 1(enp0s31f6) entered learning state 2683 Feb 25 11:30:33 server kernel: [51627.780289] br0: port 1(enp0s31f6) entered forwarding state 2684 Feb 25 11:30:33 server kernel: [51627.780294] br0: topology change detected, propagating 2685 Feb 25 11:43:12 server kernel: [52387.103542] br0: port 1(enp0s31f6) entered disabled state 2686 Feb 25 11:43:16 server kernel: [52390.756071] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None 2687 Feb 25 11:43:16 server kernel: [52390.756169] br0: port 1(enp0s31f6) entered blocking state 2688 Feb 25 11:43:16 server kernel: [52390.756172] br0: port 1(enp0s31f6) entered listening state 2689 Feb 25 11:43:20 server kernel: [52394.760120] br0: port 1(enp0s31f6) entered learning state 2690 Feb 25 11:43:24 server kernel: [52398.860229] br0: port 1(enp0s31f6) entered forwarding state 2691 Feb 25 11:43:24 server kernel: [52398.860233] br0: topology change detected, propagating 2692 Feb 25 11:43:26 server kernel: [52400.950832] br0: port 1(enp0s31f6) entered disabled state
From what I can tell (which is almost nothing),
OMV started to update something (smartmontools?)which somehow caused the network connection to break down.Edit: sorry, I realized that the update was hours before the network issue. So I have no idea what cause the bridge to break down...
What happened? And what should I do about it?
-
OK, thanks so much for clarifying.
It looked like installing zerotier required more than just installing the package: https://www.zerotier.com/download/#downloadLinux
As for the warning about installing software: I assume that this applies regardless of whether I use the APT-tool plugin or the CLI, right?
-
A few weeks or so ago I installed ZeroTier on my OMV server via CLI (tried running it in a docker container, but couldn't get it to work). I installed it manually because my impression was that OMV doesn't support zerotier natively. But now I see a Zerotier update in my OMV updates
So, I realized I must be misunderstanding something because, to me, this looks like OMV does somehow support zerotier. But then it occurred to me that OMV might simply be showing all system updates, regardless of whether something was installed via OMV or not. Is that correct?
In that case, I just wonder whether there is anything wrong with installing something via cli and then updating it via the UI.
Related to that: I don't remember installing portainer och yacht via OMV (though it's possible I did) - I've been running OMV for some years now - but OMV does see the portainer and yacht instances in the UI. Does that mean that they will be kept up to date via the OMV update management? I'm asking because portainer has for some time been telling me in its UI that there is an update but there is none showing in the OMV update management. Perhaps it just takes some extra time? Or do I still have to do the somewhat tricky process of update portainer manually (because portainer can't update itself)?
-
Excellent! Thanks for explaining. No, my VMs are not similar and I do have 24 GB of RAM, so whatever savings ksm produces, I can probably afford not to have them. Will try disabling ksm and hopefully forget about it,
-
But all the kvm plugin does is install it as a dependency. If you want to disable it, just stop and disable the service.
The problem is, I don't know if I want to disable it. What does it do? I have two KVM virtual machines running. Do these need the ksm service?
-
A couple of weeks ago, some OMV system process started to use a lot of CPU:
I started looking into this today and it seems quite evident that the perpetrator is ksmd:
Code
Alles anzeigentop - 20:46:54 up 63 days, 22:42, 1 user, load average: 1.98, 2.29, 2.37 Tasks: 404 total, 2 running, 402 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.4 us, 20.7 sy, 0.0 ni, 75.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 23826.3 total, 770.9 free, 19362.5 used, 3692.8 buff/cache MiB Swap: 7974.0 total, 6864.6 free, 1109.3 used. 1206.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32 root 25 5 0 0 0 R 29.4 0.0 11308:54 ksmd 82374 libvirt+ 20 0 5701096 3.1g 4764 S 5.9 13.5 8083:01 qemu-sy+ 1886799 508 20 0 3977720 716708 0 S 5.9 2.9 2:47.94 java 1895221 christo+ 20 0 774788 119428 8428 S 5.9 0.5 2:56.07 io.shel+ 1963879 201 39 19 2428 1456 1028 S 5.9 0.0 0:00.77 bash 1992206 root 20 0 12324 4132 3240 R 5.9 0.0 0:00.01 top 1 root 20 0 168844 10180 5716 S 0.0 0.0 76:15.96 systemd 2 root 20 0 0 0 0 S 0.0 0.0 6:32.81 kthreadd
What can I do about this?
I found this: https://serverfault.com/a/1064801/399289 but I am somewhat reluctant to tweak these settings as I assume that OMV is using reasonable defaults...
-
Patience
LOL. yes, thats probably true in general, but in this case I was unsure what is going on. and wanted to understand it. So thanks for clarifying. I now realize that releases don't seem to be published on github at all, since https://github.com/openmediavault/openmediavault/releases is empty...
Anyway: one question regarding the latest release:
ZitatAdapted Samba vfs_fruit settings according to the wiki to better work with Mac OS X. The following environment variables have been introduced:
OMV_SAMBA_SHARE_FRUIT_VETOAPPLEDOUBLE (defaults to ‘no’)
OMV_SAMBA_SHARE_FRUIT_NFSACES (defaults to ‘no’)
OMV_SAMBA_SHARE_FRUIT_WIPEINTENTIONALLYLEFTBLANKRFORK (defaults to ‘yes’)
OMV_SAMBA_SHARE_FRUIT_DELETEEMPTYADFILES (defaults to ‘yes’)
I wonder wether there is anything I need to do with my SMB-settings which currently look like this:
I would assume that I can (should?) remove those settings that became redunant in the new release?
-
OK, I see. It's good to know that there are still use cases for AFP. I have had both AFP and SMB running side by side for some time now, but it caused some confusion because the shares had identical names (because they are sharing identical directories on the server) so I figured I should either rename them or at least only mount AFP on my macs. Then I read about SMB being faster etc, so I thought perhaps I should get rid of it alltogether. Hence my question.
I haven't gotten Spotlight to reliably index those SMB shares, which might actually have been the reason why I added AFP at some point. But since indexing still seems to fail, I guess it didnt help...
-
I see here that 6.0.36-1 i is out but it doesn't come up under "Update-Management". Since the release is marked as "stable", I'd exoect it to show up but perhaps only certain releases are pushed into the UI? Or am I missing something?
-
Any particular reason for using AFP? Even Apple recommends SMB...
-
look for the kernel you are running. If you don't reboot the running kernel is the same as before upgrading.
Yes, that's what the second screenshot above is about. I'm running 5.4.174-2-pve, which, I believe is a proxmox kernel. So the funny thing is that I was running a proxmox kernel on OMV6 without having the kernel plugin installed....
-
You should restart, if there are problems to solve the sooner the better
You have a point.
But the most acute argument is this one:
The kernel thing is a bit confusing, though. I believe I was running a proxmox kernel before the upgrade but after the upgrade the kernel-option wasn't even available anymore. Had to (re-?)install the kernel-plugin. But if using a new kernel requires a reboot, then I must still be running the (old) proxmox-kernel (despite the option having disappeared after the upgrade)... Indeed:
Funny... I'm almost afraid to reboot now...