Barebones install of OMV 4 taking up all of my RAM

munkey · 17. Oktober 2019

We recently fired datto as our backup service, however we still have a datto appliance that was bought and paid for two years ago. Rather than let this beefy piece of hardware go to waste, we decided to repurpose it into a NAS device using OMV, the specs are below:

CPU: 2x Xeon E5-2650
RAM: 128GB ECC-DDR4
System Drive: 2x 120GB SSD
Storage Array: 12x 6TB NLSAS
No RAID controller
2x 1g Intel NICs
4x 10g Intel NICs

My setup involved building a vanilla Debian 9 system first, so that we could setup a software (MD) RAID on the system drives. After getting the initial setup complete, we installed OMV per the installation guide in the OMV manual. Ever since the install, the memory usage in cacti has been up near 90% almost all the time, even when the unit is sitting idle after a fresh reboot. Whereas the dashboard within OMV indicates that the memory usage is near 0%. I also checked the memory utilization in top, from the console, which matches what cacti tells us.

The first time we tried to build this machine something went awry and throughput on the NICs was abysmal, in addition to the memory issue that I'm posting about now. After that first build, we rebuilt it from scratch a second time, which fixed all the problems from the first build except the memory usage. Since this occurred on both builds, I initially chalked it up to something not reporting correctly and, since it was working otherwise, I didn't put too much thought into it. Now, however, our backup application is running a resource intensive process that is failing due to lack of memory on the OMV system. This tells me that there is definitely something eating up all of the RAM, but I can't even begin to guess at what it is since it doesn't start until after OMV is installed.

I'm not running ZFS or any other resource intensive plugins. The only plugins that I'm using are: LUKS, Diskstats, SNMP and OMV-Extras (for LUKS, I believe).

Any input would be greatly appreciated as aside from this one issue, I'm loving this product, so I really don't want to have to find another solution.

Adoby · 17. Oktober 2019

Linux, by default, use "free" memory to cache disk. If some app need more memory, then some of the caches will be deleted and the memory used for the app.

So a Linux system always have close to 100% memory used. That is normal. Depending on what tool you use you can see this as either free or available RAM. Free RAM should be close to zero. Available RAM should be significantly more, or swapping to disk may start to free up RAM, and that slows the computer down a lot.

You can use top or htop to see free and available RAM and also how much each app/thread is using. With a lot of dockers and plug-ins there may be issues with available RAM. But you would have to try hard or have very little RAM. 2 GB is fine for running several dockers.

macom · 17. Oktober 2019

https://www.linuxatemyram.com/

munkey · 17. Oktober 2019

Zitat von Adoby

Linux, by default, use "free" memory to cache disk. If some app need more memory, then some of the caches will be deleted and the memory used for the app.

So a Linux system always have close to 100% memory used. That is normal. Depending on what tool you use you can see this as either free or available RAM. Free RAM should be close to zero. Available RAM should be significantly more, or swapping to disk may start to free up RAM, and that slows the computer down a lot.

You can use top or htop to see free and available RAM and also how much each app/thread is using. With a lot of dockers and plug-ins there may be issues with available RAM. But you would have to try hard or have very little RAM. 2 GB is fine for running several dockers.

I would tend to agree with that, however, running this same task, with the same dataset on a Synology NAS with only 4GB of RAM doesn't have this problem.

I'm not running Docker or VirtualBox or any other virtualization processes. The configuration is as basic as it get's.

Adoby · 17. Oktober 2019

Check in top. Free and available memory. What does it say?

And in top or htop, what threads are consuming cycles and memory? Anything that stands out?

munkey · 17. Oktober 2019

Here's the header section from top, which I discovered is also the same as the Processes tab of the System information menu in OMV.

top - 13:47:26 up 28 days, 1:25, 1 user, load average: 5.15, 4.94, 5.11
Tasks: 683 total, 19 running, 663 sleeping, 0 stopped, 1 zombie
%Cpu(s): 0.0 us, 1.0 sy, 0.0 ni, 98.5 id, 0.5 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 13191414+total, 622828 free, 3321824 used, 12796948+buff/cache
KiB Swap: 7840764 total, 7829612 free, 11152 used. 12684686+avail Mem

The only process eating up resources at the moment is smbd, as I'm rerunning the job currently.

Since making my post this morning, I did adjust the sysctl cache settings to vm.dirty_background_ratio = 5 & vm.dirty_ratio = 10. From my understanding, this should make the caching less aggressive, which seems to be the issue here.

Adoby · 17. Oktober 2019

This seems normal, even if I am unaccustomed to these large amounts of RAM. That is a beast of a OMV NAS. Most of the RAM is used. Little is free. But still most of the used RAM is available if needed. This is just as it should be.

If you get poor network performance, then lack of memory certainly doesn't seem to be the cause.

Drivers (wrong or missing kernel modules) might be a problem.

But before you assume that, test the network performance using iperf. Most likely iperf is already installed on your server. You just need a remote box also running iperf to test the speed between the computers.

Unless you are using the RAM for something important, like storing databases in RAM, you have very little to gain from fiddling with the caching parameters. You WANT a lot of caching. That helps performance of your server. And caching does not prevent any software from using RAM if it wants to. Caching has way lower priority than allocation of memory to anything else.

If you have an UPS and a lot of RAM you definitely want to cache more writes for longer. More aggressive caching. Cached writes can be grouped, queued and merged and that means that writes to HDD becomes much faster. If available RAM goes down to 20-30% of total, you might want to relax it a bit. But you are very, very far from there. So if you really want to improve performance, make caching more aggressive. Not less.

munkey · 17. Oktober 2019

It is definitely a beastly piece of hardware, which is why this issue is so perplexing. And also why we don't want to scrap it as it cost over $20k when it was purchased so any amount of new life we can breath into it is a good thing.

Network performance is solid, I'm getting throughput that's on par with the rest of the items connected to my SAN, that was all fixed after I reinstalled Debian/OMV. I only mentioned it to as an example of how the issue was present across two fresh installs.

I understand that caching is usually a good thing since swapping is generally far slower, but right now I'm more concerned with this single process not completing normally. After digging further into the sysctl/cache settings, I found a setting where you can essentially limit the amount of RAM that's used for caching. So, as of right now, I've forced the system to keep half (64GB) of RAM free, the change is reflected in Cacti so I'm certain that it's working. Performance of OMV does not appear to have been hampered at all by this.

The task that is running (and failing) is a health check and compaction process used for Veeam backup jobs. The Veam server runs a CRC on the enter backup chain, and the failure is happening when it get's to the oldest (largest) file in the chain, which is about 8TB. My current theory is that once Veeam starts messing with that datafile, OMV isn't able to clear out enough RAM quickly enough, causing the lack of resource errors I'm getting from Veeam. This is why I'm going down the avenue of forcing the cache to be smaller. I'll report back once the job completes. If you have any other potential ideas or areas that I should look, I'm certainly open to suggestions.

Adoby · 17. Oktober 2019

OMV/Linux has no problem clearing caches fast enough.

I would check the filesystems and/or the configuration of the software that fail.

I don't think you can force the cache to be smaller. The cache is designed to use (almost) all free RAM.

All you can do is force Linux to write dirty pages quicker. And that way make OMV slower and less efficient. But the pages are still cached in the read cache even after they are written.

munkey · 18. Oktober 2019

Indeed, the job failed last night with the same error, even after all the work I'd done to reduce the amount of caching.

I was able to force the cache size to be smaller by adding "vm.min_free_kbytes = ####" to the /etc/sysctl.conf file, see attached graphic. But, as stated above, that didn't have any effect. The one change that I did notice was that the read/write throughput became synchronous.

So now we're back to square one, why am I getting a lack of resources error on OMV when I can run the same process on a Synology without issue?

Edit: Also, thanks again for all your help.

Jetzt mitmachen!