oom-killer OMV5 on HC2

robjordan · 11. Mai 2020

I have a newly-installed OMV5 on Odroid HC2. I don't think there is anything out of the ordinary about it. It is running the openmediavault-flashmemory 5.0.7 plugin installed by default. A number of CIFS shares and also some NFS shares. After just less than a week of stable 24x7 operation the oom-killer popped up and killed rpc.statd and a salt-minion. I didn't notice initially, but then 1 day later oom-killer killed rpc.mountd and another salt-minion.

Both issues arose just after 04:30, and I notice that cron seemed to be running debian-sa1 at just that time, and armbian-truncate-logs 5 minutes earlier.

I've posted syslog here, showing the oom-killer processing.
https://gist.githubusercontent…0b458fc5e6b9908999/syslog

Any ideas what is going on? I see that rpc.statd and salt-minion are both slightly RSS-heavy, but surely this isn't behaviour to be expected?

ryecoaaron · 11. Mai 2020

OOM = Out of Memory. So, you are doing something on your HC2 that is using too much ram.

robjordan · 11. Mai 2020

Zitat von ryecoaaron

OOM = Out of Memory. So, you are doing something on your HC2 that is using too much ram.

Thanks. I get that. But the point is I'm just using it as a SMB+NFS file server, this is the middle of the night, many other people surely have similar configuration, are they all getting out of memory conditions?

Adoby · 11. Mai 2020

No, they are not.

That is why it is likely that your system is running something special that other people's system are not running. Or there is something wrong with your system.

When you say that you have "a number" of shares, EXACTLY what number are you talking about? If it is a very large number that may be the problem.

You say you have CIFS shares. I assume that is wrong? That what you actually have is SMB shares configured using the OMV GUI? (OMV calls it SMB/CIFS I think.)

I typically have at most 2-3 shared folders. And only one of those are actually shared using NFS or SMB.

robjordan · 11. Mai 2020

Thanks.

9 SMB/CIFS shares and 7 NFS shares, all configured via the OMV GUI. It doesn't seem excessive to me in principle, but maybe there is a cost I am misunderstanding. I migrated this setup, with the same shares, from a much less powerful system with 512MB memory and a very primitive Marvell SOC (Zyxel NSA325) so I'm puzzled why this use case should be a struggle for OMV on a 2GB HC2.

Adoby · 11. Mai 2020

I suspect that is more shares than most, but I have no idea if it is too many. I doubt it is.

But obviously something is very special/different/odd with your system...

Something you are not telling us?

robjordan · 11. Mai 2020

Joking aside, I'd really appreciate some help understanding the memory management approach in OMV5. I just have a feeling that lack of swap space is causing a short-term demand for RAM to push this system into overcommit, which triggers the process killing. I'm desperately trying to understand the memory info in the syslog I posted but it's tough going. I notice the 3 PHP-FPM processes are the heaviest individual users of virtual memory at 150M in total, but still, it's small potatoes. Maybe NFS is using memory to cache in a way that triggers overcommit.

ryecoaaron · 11. Mai 2020

That is not too many shares. And your Zyxel was probably not putting logs and other heavily written files in ram to reduce writes to flash media. Yes, that chews up ram but it lets your SD card not fail after a few weeks.

Your systems connecting to the HC2 must be doing so at the same time in the night if you are OOM'ing the system. I have written 2.7 TB of data via borgbackup to a 2GB Nano Pi M4 since last night and it hasn't used any swap. You are running NO other services other than SMB and NFS and the clients aren't doing something like virus scanning the shares??

robjordan · 11. Mai 2020

I'm running no other services but SMB and NFS.

I am accessing the NFS shares at night for backup. One of the file systems does start its backup at 04:30, so it's quite possible that it may be the trigger that tips this over into out-of-memory. But it's still a reasonable use case I think, nothing out of the ordinary. The backup client is running on a separate server, only accessing OMV via NFS.

A couple of things I'm looking at. From the syslog, at the moment the oom-killer runs, the categories of memory are listed like this. Yes, the total is 2.2GB, but that includes 834M listed as free, and 484M listed as slab_reclaimable (which I understand is a kind of kernel cache, so should be available for use). I note that vm.overcommit_ratio is set to 50 (%), I think that's a default inherited from Armbian, I don't know how that influences the point at which the oom-killer is invoked.

The other thing I will look at are the NFS mount options. As I just dropped in the new HC2 as a replacement for my old Zyxel, the mount options haven't been changed. Maybe they are not the best for this new platform? From /etc/fstab on the client side:

mandy:/export/jordan /media/jordan nfs rsize=8192,wsize=8192,timeo=14,intr,nolock,ro

Thanks for your thoughts guys.

robjordan · 30. Mai 2020

The problem hasn't recurred in almost 3 weeks. Perhaps because I shifted the timing of a remote cron job that accesses one of the NFS shares by 5 minutes, to avoid a clash with the local cron jobs. I guess I'll never know for sure.

Jetzt mitmachen!