The problem hasn't recurred in almost 3 weeks. Perhaps because I shifted the timing of a remote cron job that accesses one of the NFS shares by 5 minutes, to avoid a clash with the local cron jobs. I guess I'll never know for sure.
I'm running no other services but SMB and NFS.
I am accessing the NFS shares at night for backup. One of the file systems does start its backup at 04:30, so it's quite possible that it may be the trigger that tips this over into out-of-memory. But it's still a reasonable use case I think, nothing out of the ordinary. The backup client is running on a separate server, only accessing OMV via NFS.
A couple of things I'm looking at. From the syslog, at the moment the oom-killer runs, the categories of memory are listed like this. Yes, the total is 2.2GB, but that includes 834M listed as free, and 484M listed as slab_reclaimable (which I understand is a kind of kernel cache, so should be available for use). I note that vm.overcommit_ratio is set to 50 (%), I think that's a default inherited from Armbian, I don't know how that influences the point at which the oom-killer is invoked.
The other thing I will look at are the NFS mount options. As I just dropped in the new HC2 as a replacement for my old Zyxel, the mount options haven't been changed. Maybe they are not the best for this new platform? From /etc/fstab on the client side:
mandy:/export/jordan /media/jordan nfs rsize=8192,wsize=8192,timeo=14,intr,nolock,ro
Thanks for your thoughts guys.
Joking aside, I'd really appreciate some help understanding the memory management approach in OMV5. I just have a feeling that lack of swap space is causing a short-term demand for RAM to push this system into overcommit, which triggers the process killing. I'm desperately trying to understand the memory info in the syslog I posted but it's tough going. I notice the 3 PHP-FPM processes are the heaviest individual users of virtual memory at 150M in total, but still, it's small potatoes. Maybe NFS is using memory to cache in a way that triggers overcommit.
9 SMB/CIFS shares and 7 NFS shares, all configured via the OMV GUI. It doesn't seem excessive to me in principle, but maybe there is a cost I am misunderstanding. I migrated this setup, with the same shares, from a much less powerful system with 512MB memory and a very primitive Marvell SOC (Zyxel NSA325) so I'm puzzled why this use case should be a struggle for OMV on a 2GB HC2.
OOM = Out of Memory. So, you are doing something on your HC2 that is using too much ram.
Thanks. I get that. But the point is I'm just using it as a SMB+NFS file server, this is the middle of the night, many other people surely have similar configuration, are they all getting out of memory conditions?
I have a newly-installed OMV5 on Odroid HC2. I don't think there is anything out of the ordinary about it. It is running the openmediavault-flashmemory 5.0.7 plugin installed by default. A number of CIFS shares and also some NFS shares. After just less than a week of stable 24x7 operation the oom-killer popped up and killed rpc.statd and a salt-minion. I didn't notice initially, but then 1 day later oom-killer killed rpc.mountd and another salt-minion.
Both issues arose just after 04:30, and I notice that cron seemed to be running debian-sa1 at just that time, and armbian-truncate-logs 5 minutes earlier.
I've posted syslog here, showing the oom-killer processing.
Any ideas what is going on? I see that rpc.statd and salt-minion are both slightly RSS-heavy, but surely this isn't behaviour to be expected?