OK - I was right - looks like those 5.x versioned packages were remnants of the OMV 5 install that the migration did not clear up.
I ran "apt remove omvextras-unionbackend" which succeeded w/out error (and also removed "openmediavault-unionfilesystems") too.
I then ran an "apt purge omvextras-unionbackend openmediavault-unionfilesystems" which again produced no error.
I manually re-ran "omv-salt deploy run --no-color fstab" on the CLI and it succeeded now.
I expect this will be put to bed now.
Thanks for the tip!
Beiträge von rodhull
-
-
Interesting - I am already on 6 - I'm running the OS from an mSATA drive (plugged into an adapter in my HP Gen7 Microserver's internal USB port). Transfer speeds are perfectly acceptable.
This only happened since upgrading to 6 some months ago...is it possible something went awry during the migration causing this message?
Here is some interesting output:Code
Alles anzeigenroot@omv:~# uname -a Linux omv 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux root@omv:~# dpkg -l|grep openmediavault ii omvextras-unionbackend 5.0.2 all union filesystems backend plugin for openmediavault ii openmediavault 6.3.8-1 all openmediavault - The open network attached storage solution ii openmediavault-backup 6.1 all backup plugin for OpenMediaVault. ii openmediavault-flashmemory 6.2 all folder2ram plugin for openmediavault ii openmediavault-keyring 1.0 all GnuPG archive keys of the OpenMediaVault archive ii openmediavault-mergerfs 6.3.5 all mergerfs plugin for openmediavault. ii openmediavault-omvextrasorg 6.1.1 all OMV-Extras.org Package Repositories for OpenMediaVault ii openmediavault-remotemount 6.1.1 all Remote mount plugin for OpenMediaVault. ii openmediavault-resetperms 6.0.2 all Reset Permissions ii openmediavault-sharerootfs 6.0.2-1 all openmediavault share root filesystem plugin ii openmediavault-snapraid 6.1 all snapraid plugin for OpenMediaVault. ii openmediavault-unionfilesystems 5.1.4 all Union filesystems plugin for OpenMediaVault. root@omv:~#
The "openmediavault-unionfilesystems" and ""omvextras-unionbackend" packages look like suspiciously the wrong versions.
Now I come to think of it, when I migrated from OMV5 to 6 there were some wobbles due to mergerfs plugins I believe and I had to look on the forums for some fixes - I cannot recall exactly what I did unfortunately - I think there was a script I ran - are those 5.x versioned packages now in the way/should be purged? -
This is still a problem on my system. As I originally posted, *sometimes* (not always) after having run an "apt dist-upgrade" when re-visiting the webUI I will be greeted by a warning that some changes need to be applied. When I attempt to apply them it generates a 500 error.
For me it is when it tries to run "omv-salt deploy run --no-color fstab" and the errors are relating to mergerfs or the unionfs plugin or something.Code
Alles anzeigenFailed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': omv: Data failed to compile: ---------- Rendering SLS 'base:omv.deploy.fstab.15unionfilesystems' failed: Jinja error: 'NoneType' object has no attribute 'get_dict' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl output = template.render(**decoded_context) File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render self.environment.handle_exception() File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception reraise(*rewrite_traceback_stack(source=source)) File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise raise value.with_traceback(tb) File "<template>", line 1, in top-level template code File "/usr/lib/python3/dist-packages/jinja2/sandbox.py", line 465, in call return __context.call(__obj, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as return _func_or_method(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/omv_conf.py", line 44, in get return objs.get_dict() AttributeError: 'NoneType' object has no attribute 'get_dict' ; line 1 --- {% set config = salt['omv_conf.get']('conf.service.unionfilesystems') %} <====================== {% for pool in config.filesystem %} {% set poolmount = salt['omv_conf.get_by_filter']( 'conf.system.filesystem.mountpoint', {'operator':'stringEquals', 'arg0':'uuid', 'arg1':pool.self_mntentref}) %} [...] --- OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': omv: Data failed to compile: ---------- Rendering SLS 'base:omv.deploy.fstab.15unionfilesystems' failed: Jinja error: 'NoneType' object has no attribute 'get_dict' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl output = template.render(**decoded_context) File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render self.environment.handle_exception() File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception reraise(*rewrite_traceback_stack(source=source)) File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise raise value.with_traceback(tb) File "<template>", line 1, in top-level template code File "/usr/lib/python3/dist-packages/jinja2/sandbox.py", line 465, in call return __context.call(__obj, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as return _func_or_method(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/omv_conf.py", line 44, in get return objs.get_dict() AttributeError: 'NoneType' object has no attribute 'get_dict' ; line 1 --- {% set config = salt['omv_conf.get']('conf.service.unionfilesystems') %} <====================== {% for pool in config.filesystem %} {% set poolmount = salt['omv_conf.get_by_filter']( 'conf.system.filesystem.mountpoint', {'operator':'stringEquals', 'arg0':'uuid', 'arg1':pool.self_mntentref}) %} [...] --- in /usr/share/php/openmediavault/system/process.inc:220 Stack trace: #0 /usr/share/php/openmediavault/engine/module/serviceabstract.inc(62): OMV\System\Process->execute() #1 /usr/share/openmediavault/engined/rpc/config.inc(174): OMV\Engine\Module\ServiceAbstract->deploy() #2 [internal function]: Engined\Rpc\Config->applyChanges(Array, Array) #3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array) #4 /usr/share/php/openmediavault/rpc/serviceabstract.inc(149): OMV\Rpc\ServiceAbstract->callMethod('applyChanges', Array, Array) #5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(619): OMV\Rpc\ServiceAbstract->OMV\Rpc\{closure}('/tmp/bgstatusJ1...', '/tmp/bgoutputgz...') #6 /usr/share/php/openmediavault/rpc/serviceabstract.inc(159): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure)) #7 /usr/share/openmediavault/engined/rpc/config.inc(195): OMV\Rpc\ServiceAbstract->callMethodBg('applyChanges', Array, Array) #8 [internal function]: Engined\Rpc\Config->applyChangesBg(Array, Array) #9 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array) #10 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('applyChangesBg', Array, Array) #11 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('Config', 'applyChangesBg', Array, Array, 1) #12 {main}
If I run "omv-salt deploy run --no-color fstab" on the CLI manually I get this:
Code
Alles anzeigenomv: Data failed to compile: ---------- Rendering SLS 'base:omv.deploy.fstab.15unionfilesystems' failed: Jinja error: 'NoneType' object has no attribute 'get_dict' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl output = template.render(**decoded_context) File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render self.environment.handle_exception() File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception reraise(*rewrite_traceback_stack(source=source)) File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise raise value.with_traceback(tb) File "<template>", line 1, in top-level template code File "/usr/lib/python3/dist-packages/jinja2/sandbox.py", line 465, in call return __context.call(__obj, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as return _func_or_method(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/omv_conf.py", line 44, in get return objs.get_dict() AttributeError: 'NoneType' object has no attribute 'get_dict' ; line 1 --- {% set config = salt['omv_conf.get']('conf.service.unionfilesystems') %} <====================== {% for pool in config.filesystem %} {% set poolmount = salt['omv_conf.get_by_filter']( 'conf.system.filesystem.mountpoint', {'operator':'stringEquals', 'arg0':'uuid', 'arg1':pool.self_mntentref}) %} [...] ---
What is causing this? The server/services/mergerfs pool etc. etc. are all working perfectly despite this...
-
Most of the time (not always) after a manual apt upgrade or dist-upgrade (which will complete without errors) when next visiting the webUI I am greeted with a yellow "Pending configuration changes" banner (despite not having made any changes myself).
If I hit the tick to apply them, after a few seconds I get an internal server error - specifically this:Code
Alles anzeigenFailed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': omv: Data failed to compile: ---------- Rendering SLS 'base:omv.deploy.fstab.15unionfilesystems' failed: Jinja error: 'NoneType' object has no attribute 'get_dict' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl output = template.render(**decoded_context) File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render self.environment.handle_exception() File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception reraise(*rewrite_traceback_stack(source=source)) File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise raise value.with_traceback(tb) File "<template>", line 1, in top-level template code File "/usr/lib/python3/dist-packages/jinja2/sandbox.py", line 465, in call return __context.call(__obj, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as return _func_or_method(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/omv_conf.py", line 45, in get return objs.get_dict() AttributeError: 'NoneType' object has no attribute 'get_dict' ; line 1 --- {% set config = salt['omv_conf.get']('conf.service.unionfilesystems') %} <====================== {% for pool in config.filesystem %} {% set poolmount = salt['omv_conf.get_by_filter']( 'conf.system.filesystem.mountpoint', {'operator':'stringEquals', 'arg0':'uuid', 'arg1':pool.self_mntentref}) %} [...] --- OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': omv: Data failed to compile: ---------- Rendering SLS 'base:omv.deploy.fstab.15unionfilesystems' failed: Jinja error: 'NoneType' object has no attribute 'get_dict' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl output = template.render(**decoded_context) File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render self.environment.handle_exception() File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception reraise(*rewrite_traceback_stack(source=source)) File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise raise value.with_traceback(tb) File "<template>", line 1, in top-level template code File "/usr/lib/python3/dist-packages/jinja2/sandbox.py", line 465, in call return __context.call(__obj, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as return _func_or_method(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/omv_conf.py", line 45, in get return objs.get_dict() AttributeError: 'NoneType' object has no attribute 'get_dict' ; line 1 --- {% set config = salt['omv_conf.get']('conf.service.unionfilesystems') %} <====================== {% for pool in config.filesystem %} {% set poolmount = salt['omv_conf.get_by_filter']( 'conf.system.filesystem.mountpoint', {'operator':'stringEquals', 'arg0':'uuid', 'arg1':pool.self_mntentref}) %} [...] --- in /usr/share/php/openmediavault/system/process.inc:197 Stack trace: #0 /usr/share/php/openmediavault/engine/module/serviceabstract.inc(62): OMV\System\Process->execute() #1 /usr/share/openmediavault/engined/rpc/config.inc(170): OMV\Engine\Module\ServiceAbstract->deploy() #2 [internal function]: Engined\Rpc\Config->applyChanges(Array, Array) #3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array) #4 /usr/share/php/openmediavault/rpc/serviceabstract.inc(149): OMV\Rpc\ServiceAbstract->callMethod('applyChanges', Array, Array) #5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(588): OMV\Rpc\ServiceAbstract->OMV\Rpc\{closure}('/tmp/bgstatus3N...', '/tmp/bgoutputeO...') #6 /usr/share/php/openmediavault/rpc/serviceabstract.inc(159): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure)) #7 /usr/share/openmediavault/engined/rpc/config.inc(192): OMV\Rpc\ServiceAbstract->callMethodBg('applyChanges', Array, Array) #8 [internal function]: Engined\Rpc\Config->applyChangesBg(Array, Array) #9 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array) #10 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('applyChangesBg', Array, Array) #11 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('Config', 'applyChangesBg', Array, Array, 1) #12 {main}
I can undo the "changes" (whatever they are!) and it seems to go away until a future time I might do another apt update...
-
Cheers...that seems to have got rid of it.
I actually found some reference to another couple of disks I had connected into the server in the past but are long gone now in config.xml when searching around so I removed them too.
Any reason why removing them in the GUI in the past hadn't really got rid of them in config.xml?
-
Sorry to bump, but can anyone help with this, please?
-
I have retired a disk which was previously present in my OMV server. I did it all correctly: went through anywhere in the OMV Web UI removing reference to it, unmounting and deleting the filesystem etc, then finally physically removed it from the server (it was connected by USB if that makes any difference) - I checked and there is no mention of it in fstab, no mention of it anywhere in the OMV web UI (checked Disks, SMART, filesystems etc - all gone as expected). I have since rebooted the server to be sure.
Ever since removing it I have been getting daily error messages from SMART e-mailed to me saying "FailedOpenDevice" for the now absent disk.
I can see why this is happening:
I still see an entry for it in /etc/smartd.conf and it is referenced twice in /etc/openmediavault/config.xml in relation to both "hdparm" and "smart".
Why are these entries still there?
I'm happy to go fiddling with the config files manually but I don't believe I should have to? I would have thought the Web UI ought to have done this?
What is the safest/most correct way now to remove all references to this now absent disk to stop these false alerts? -
Thanks for the tips.
I disabled the plugin and waited until the next reboot - it happened last night (twice actually a couple of hours apart). Unfortunately there is absolutely nothing in the logs (checked syslog, kern.log, debug, messages) to tell me what it might be.
There is nothing at all right before the reboots other than regular log messages. There is then just a gap of a couple of minutes (the time it takes to reboot) then the regular startup messages start appearing.However, on the second reboot, right at the point of failure some of the logs did have a few lines like:
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
... I have no idea what it means.
The system is backed by an UPS so there is no power issue (unless the PSU itself is faulty, but this has already been replaced as I said earlier).
No idea how to debug this now! Any ideas? -
Been experiencing random spontaneous reboots once in a while from my HP gen7 microserver running OMV5. Sometimes the system enjoys uptime of many days/weeks, but then sometimes I will get a spontaneous reboot. It always occurs ONLY in the early hours of the morning (various times between 1-6 typically, not seemingly linked to any scheduled tasks I have knowingly set myself).
I am using the flashmemory plugin since my boot device/rootfs is a USB stick on the microserver's internal USB port.
I cannot therefore dig into the logs (syslog, kern.log, messages etc.) to figure out what might be going on right at the point of the spontaneous reboots since the folder2ram mechanism doesn't seem to flush/retain them persistently at the point of the reboot (presumably since it's spontaneous and not graceful when it occurs).
How can I continue to use the flashmemory plugin to avoid unnecessary wear but at the same time make sure at least syslog, kern.log and messages are consistently and persistently saved to physical media (ideally to a folder on the mergerfs array OMV is mainly tasked with serving)? Then I might be able to get some pointers as to why the system is sometimes unstable...at the moment I literally have no idea.
I have already replaced the original PSU (due to its fan becoming too noisy - I was getting the reboots periodically on the original one too as well as its replacement). I've also run multiple memtest passes with no errors whatsoever so I am not leaning towards a h/w specific issue, especially since it only ever happens in the small hours. If it were h/w related I would imagine the pattern to be much more random?
I was hoping the logs might give me a clue in case there is another problem or it's being caused by a task the system is performing that I didn't schedule myself.
-
Fair enough - I'm almost certain this behaviour persisted across the first reboot though...it could have been rectified after having re-refreshed the web interface after the first reboot...perhaps an old setting was still stuck in there. I certainly only had one line in my /etc/fstab for the pool mount and the config.xml looked good...it's odd.
-
I may have found a potential bug/feature in the openmediavault-unionfilesystems (4.0.2) plugin...
I have a mergerfs pool (which I also export via NFS) of 3x drives - the create policy has always been (until today) "Existing path, least free space".
I accidentally filled up one of my drives earlier and subsequently realised I didn't really need any path preservation so decided to change the create policy via the web UI to "Most free space".
I did so, saved changes, stopped the NFS server, unmounted the mergerfs pool, remounted the mergerfs pool and restarted the NFS server.
All seemed well at first, but it appeared that the create behaviour had actually become "Least free space". The Web UI still reflected the correct chosen option.
I reported the bug to trapexit via his github: https://github.com/trapexit/mergerfs/issues/664 but we determined that actually it wasn't a problem with mergerfs itself.
I noticed 2x separate mergerfs processes were running - this behaviour persisted across the first reboot but went away once I manually killed both mergerfs processes and re-mounted.
Strangely one of the mergerfs processes had the correct mount option "category.create=mfs" but the other (which originally had started a few minutes later) had "category.create=lfs" (an option I had not even chosen - remember: the original option I changed from was "eplfs").
Anyway, after killing both, and rebooting I now have the correct "mfs" behaviour.
It must be something to do with the plugin/web interface since i try to never make manual changes to my OMV server, preferring to do everything via the web UI but in this case it was not consistent...
-
For anyone interested, since running a combination of the same setup as above but now for a good while with:
kernel 4.19.0-0.bpo.5-amd64
mergerfs version: 2.27.1
FUSE library version: 2.9.7-mergerfs_2.27.0...I haven't experienced this issue.
Thanks trapexit!
-
BACKGROUND INFORMATION
Latest stable version of OMV:
Linux omv 4.19.0-0.bpo.4-amd64 #1 SMP Debian 4.19.28-2~bpo9+1 (2019-03-27) x86_64 GNU/LinuxMergerfs pool containing 3x drives w/ ext4 filesystems - here's the fstab entry:
/srv/dev-disk-by-label-WD6TBBAY1:/srv/dev-disk-by-id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1:/srv/dev-disk-by-label-WD3TBBAY2 /srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764 fuse.mergerfs defaults,allow_other,direct_io,use_ino,noforget,category.create=eplfs,minfreespace=10M 0 0mergerfs (and FUSE) versions:
- mergerfs version: 2.26.2
- FUSE library version: 2.9.7-mergerfs_2.26.0
- fusermount version: 2.9.7
- using FUSE kernel interface version 7.27
There's a single shared folder called "media" in the root of the mergerfs pool.
NFS export:
/export/media *(fsid=1,rw,subtree_check,insecure,no_root_squash,anonuid=1000)relevant bit of df when things are working:
label-WD6TBBAY1:id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1:label-WD3TBBAY2 12T 5.8T 5.8T 51% /srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764
/dev/sdb1 2.7T 89M 2.7T 1% /srv/dev-disk-by-label-WD3TBBAY2
/dev/sdd1 3.6T 3.2T 221G 94% /srv/dev-disk-by-id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1
/dev/sda1 5.5T 2.6T 2.9T 48% /srv/dev-disk-by-label-WD6TBBAY1...and from "mount":
label-WD6TBBAY1:id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1:label-WD3TBBAY2 on /export/media type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)THE PROBLEM
I have had 2 random crashes of nfs - when I notice it, the mergerfs mountpoint has gone on the server too - here's an example of the relevant snippet of /var/log/syslog:
May 22 00:06:25 omv kernel: [99245.846518] ------------[ cut here ]------------
May 22 00:06:25 omv kernel: [99245.846523] nfsd: non-standard errno: -103
May 22 00:06:25 omv kernel: [99245.846618] WARNING: CPU: 1 PID: 816 at /build/linux-tpKJY9/linux-4.19.28/fs/nfsd/nfsproc.c:820 nfserrno+0x65/0x80 [nfsd]
May 22 00:06:25 omv kernel: [99245.846620] Modules linked in: msr softdog cpufreq_powersave cpufreq_userspace cpufreq_conservative radeon edac_mce_amd kvm_amd ccp ttm rng_core drm_kms_helper kvm evdev drm irqbypass k10temp ipmi_si pcspkr ipmi_devintf ipmi_msghandler sg i2c_algo_bit sp5100_tco button pcc_cpufreq acpi_cpufreq fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor uas usb_storage sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ohci_pci ata_generic ahci libahci pata_atiixp libata tg3 libphy i2c_piix4 scsi_mod ohci_hcd ehci_pci xhci_pci ehci_hcd xhci_hcd usbcore usb_common
May 22 00:06:25 omv kernel: [99245.846735] CPU: 1 PID: 816 Comm: nfsd Not tainted 4.19.0-0.bpo.4-amd64 #1 Debian 4.19.28-2~bpo9+1
May 22 00:06:25 omv kernel: [99245.846737] Hardware name: HP ProLiant MicroServerr, BIOS O41 10/01/2013
May 22 00:06:25 omv kernel: [99245.846762] RIP: 0010:nfserrno+0x65/0x80 [nfsd]
May 22 00:06:25 omv kernel: [99245.846767] Code: 13 05 00 00 b8 00 00 00 05 74 02 f3 c3 48 83 ec 08 89 fe 48 c7 c7 7a 15 83 c0 89 44 24 04 c6 05 1c 13 05 00 01 e8 0b e8 47 c8 <0f> 0b 8b 44 24 04 48 83 c4 08 c3 31 c0 c3 0f 1f 00 66 2e 0f 1f 84
May 22 00:06:25 omv kernel: [99245.846770] RSP: 0018:ffffb8c9c1607d98 EFLAGS: 00010282
May 22 00:06:25 omv kernel: [99245.846774] RAX: 0000000000000000 RBX: ffff9572947e1008 RCX: 0000000000000006
May 22 00:06:25 omv kernel: [99245.846777] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff957297c966a0
May 22 00:06:25 omv kernel: [99245.846780] RBP: ffff9572947e1168 R08: 0000000000000001 R09: 000000000000032f
May 22 00:06:25 omv kernel: [99245.846782] R10: ffffb8c9c63efd60 R11: 0000000000000000 R12: ffff95726f5020c0
May 22 00:06:25 omv kernel: [99245.846784] R13: ffff957272bc5cc0 R14: 00000000ffffff99 R15: ffff957294027780
May 22 00:06:25 omv kernel: [99245.846788] FS: 0000000000000000(0000) GS:ffff957297c80000(0000) knlGS:0000000000000000
May 22 00:06:25 omv kernel: [99245.846791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 00:06:25 omv kernel: [99245.846794] CR2: 00007fa4b2c55000 CR3: 000000014020a000 CR4: 00000000000006e0
May 22 00:06:25 omv kernel: [99245.846796] Call Trace:
May 22 00:06:25 omv kernel: [99245.846828] nfsd_rename+0x1ca/0x2b0 [nfsd]
May 22 00:06:25 omv kernel: [99245.846855] nfsd3_proc_rename+0x9b/0x130 [nfsd]
May 22 00:06:25 omv kernel: [99245.846878] nfsd_dispatch+0xb1/0x240 [nfsd]
May 22 00:06:25 omv kernel: [99245.846930] svc_process_common+0x3bf/0x780 [sunrpc]
May 22 00:06:25 omv kernel: [99245.846969] svc_process+0xe9/0x100 [sunrpc]
May 22 00:06:25 omv kernel: [99245.846991] nfsd+0xe3/0x150 [nfsd]
May 22 00:06:25 omv kernel: [99245.846999] kthread+0xf8/0x130
May 22 00:06:25 omv kernel: [99245.847021] ? nfsd_destroy+0x60/0x60 [nfsd]
May 22 00:06:25 omv kernel: [99245.847026] ? kthread_create_worker_on_cpu+0x70/0x70
May 22 00:06:25 omv kernel: [99245.847032] ret_from_fork+0x22/0x40
May 22 00:06:25 omv kernel: [99245.847037] ---[ end trace b2717fa65f13ab36 ]---
May 22 00:06:32 omv collectd[1236]: statvfs(/srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764) failed: Transport endpoint is not connected
May 22 00:06:42 omv collectd[1236]: statvfs(/srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764) failed: Transport endpoint is not connected
May 22 00:06:46 omv monit[1226]: 'filesystem_srv_dev-disk-by-id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1' space usage 88.9% matches resource limit [space usage>85.0%]
May 22 00:06:46 omv monit[1226]: Device /srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764 not found in /etc/mtabThis leads to "transport endpoint is not connected" errors on the OMV server and clients attempting to read/write get input/output and/or stale file handle errors. If I reboot the OMV server service is restored for a while - but it's happened twice within 72 hours.
I notice a similar problem on unraid's forums: https://forums.unraid.net/bug-…60-nfs-kernel-crash-r199/
I don't know if it's the kernel/NFS, FUSE or mergerfs at fault here.
How can I debug this? mergerfs is about the only thing built into OMV that I can find that suits my needs and I HAVE to be able to export it over NFS since it offers the best performance for my Kodi clients (and for other Linux software/services I have elsewhere on the network reading/writing to the pool)