Posts by rodhull

    I may have found a potential bug/feature in the openmediavault-unionfilesystems (4.0.2) plugin...


    I have a mergerfs pool (which I also export via NFS) of 3x drives - the create policy has always been (until today) "Existing path, least free space".


    I accidentally filled up one of my drives earlier and subsequently realised I didn't really need any path preservation so decided to change the create policy via the web UI to "Most free space".


    I did so, saved changes, stopped the NFS server, unmounted the mergerfs pool, remounted the mergerfs pool and restarted the NFS server.


    All seemed well at first, but it appeared that the create behaviour had actually become "Least free space". The Web UI still reflected the correct chosen option.


    I reported the bug to trapexit via his github: https://github.com/trapexit/mergerfs/issues/664 but we determined that actually it wasn't a problem with mergerfs itself.


    I noticed 2x separate mergerfs processes were running - this behaviour persisted across the first reboot but went away once I manually killed both mergerfs processes and re-mounted.


    Strangely one of the mergerfs processes had the correct mount option "category.create=mfs" but the other (which originally had started a few minutes later) had "category.create=lfs" (an option I had not even chosen - remember: the original option I changed from was "eplfs").


    Anyway, after killing both, and rebooting I now have the correct "mfs" behaviour.


    It must be something to do with the plugin/web interface since i try to never make manual changes to my OMV server, preferring to do everything via the web UI but in this case it was not consistent...

    For anyone interested, since running a combination of the same setup as above but now for a good while with:


    kernel 4.19.0-0.bpo.5-amd64
    mergerfs version: 2.27.1
    FUSE library version: 2.9.7-mergerfs_2.27.0


    ...I haven't experienced this issue.


    Thanks trapexit!

    BACKGROUND INFORMATION


    Latest stable version of OMV:
    Linux omv 4.19.0-0.bpo.4-amd64 #1 SMP Debian 4.19.28-2~bpo9+1 (2019-03-27) x86_64 GNU/Linux


    Mergerfs pool containing 3x drives w/ ext4 filesystems - here's the fstab entry:
    /srv/dev-disk-by-label-WD6TBBAY1:/srv/dev-disk-by-id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1:/srv/dev-disk-by-label-WD3TBBAY2 /srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764 fuse.mergerfs defaults,allow_other,direct_io,use_ino,noforget,category.create=eplfs,minfreespace=10M 0 0


    mergerfs (and FUSE) versions:

    • mergerfs version: 2.26.2
    • FUSE library version: 2.9.7-mergerfs_2.26.0
    • fusermount version: 2.9.7
    • using FUSE kernel interface version 7.27


    There's a single shared folder called "media" in the root of the mergerfs pool.


    NFS export:
    /export/media *(fsid=1,rw,subtree_check,insecure,no_root_squash,anonuid=1000)


    relevant bit of df when things are working:


    label-WD6TBBAY1:id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1:label-WD3TBBAY2 12T 5.8T 5.8T 51% /srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764
    /dev/sdb1 2.7T 89M 2.7T 1% /srv/dev-disk-by-label-WD3TBBAY2
    /dev/sdd1 3.6T 3.2T 221G 94% /srv/dev-disk-by-id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1
    /dev/sda1 5.5T 2.6T 2.9T 48% /srv/dev-disk-by-label-WD6TBBAY1


    ...and from "mount":
    label-WD6TBBAY1:id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1:label-WD3TBBAY2 on /export/media type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)


    THE PROBLEM


    I have had 2 random crashes of nfs - when I notice it, the mergerfs mountpoint has gone on the server too - here's an example of the relevant snippet of /var/log/syslog:



    May 22 00:06:25 omv kernel: [99245.846518] ------------[ cut here ]------------
    May 22 00:06:25 omv kernel: [99245.846523] nfsd: non-standard errno: -103
    May 22 00:06:25 omv kernel: [99245.846618] WARNING: CPU: 1 PID: 816 at /build/linux-tpKJY9/linux-4.19.28/fs/nfsd/nfsproc.c:820 nfserrno+0x65/0x80 [nfsd]
    May 22 00:06:25 omv kernel: [99245.846620] Modules linked in: msr softdog cpufreq_powersave cpufreq_userspace cpufreq_conservative radeon edac_mce_amd kvm_amd ccp ttm rng_core drm_kms_helper kvm evdev drm irqbypass k10temp ipmi_si pcspkr ipmi_devintf ipmi_msghandler sg i2c_algo_bit sp5100_tco button pcc_cpufreq acpi_cpufreq fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor uas usb_storage sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ohci_pci ata_generic ahci libahci pata_atiixp libata tg3 libphy i2c_piix4 scsi_mod ohci_hcd ehci_pci xhci_pci ehci_hcd xhci_hcd usbcore usb_common
    May 22 00:06:25 omv kernel: [99245.846735] CPU: 1 PID: 816 Comm: nfsd Not tainted 4.19.0-0.bpo.4-amd64 #1 Debian 4.19.28-2~bpo9+1
    May 22 00:06:25 omv kernel: [99245.846737] Hardware name: HP ProLiant MicroServerr, BIOS O41 10/01/2013
    May 22 00:06:25 omv kernel: [99245.846762] RIP: 0010:nfserrno+0x65/0x80 [nfsd]
    May 22 00:06:25 omv kernel: [99245.846767] Code: 13 05 00 00 b8 00 00 00 05 74 02 f3 c3 48 83 ec 08 89 fe 48 c7 c7 7a 15 83 c0 89 44 24 04 c6 05 1c 13 05 00 01 e8 0b e8 47 c8 <0f> 0b 8b 44 24 04 48 83 c4 08 c3 31 c0 c3 0f 1f 00 66 2e 0f 1f 84
    May 22 00:06:25 omv kernel: [99245.846770] RSP: 0018:ffffb8c9c1607d98 EFLAGS: 00010282
    May 22 00:06:25 omv kernel: [99245.846774] RAX: 0000000000000000 RBX: ffff9572947e1008 RCX: 0000000000000006
    May 22 00:06:25 omv kernel: [99245.846777] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff957297c966a0
    May 22 00:06:25 omv kernel: [99245.846780] RBP: ffff9572947e1168 R08: 0000000000000001 R09: 000000000000032f
    May 22 00:06:25 omv kernel: [99245.846782] R10: ffffb8c9c63efd60 R11: 0000000000000000 R12: ffff95726f5020c0
    May 22 00:06:25 omv kernel: [99245.846784] R13: ffff957272bc5cc0 R14: 00000000ffffff99 R15: ffff957294027780
    May 22 00:06:25 omv kernel: [99245.846788] FS: 0000000000000000(0000) GS:ffff957297c80000(0000) knlGS:0000000000000000
    May 22 00:06:25 omv kernel: [99245.846791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May 22 00:06:25 omv kernel: [99245.846794] CR2: 00007fa4b2c55000 CR3: 000000014020a000 CR4: 00000000000006e0
    May 22 00:06:25 omv kernel: [99245.846796] Call Trace:
    May 22 00:06:25 omv kernel: [99245.846828] nfsd_rename+0x1ca/0x2b0 [nfsd]
    May 22 00:06:25 omv kernel: [99245.846855] nfsd3_proc_rename+0x9b/0x130 [nfsd]
    May 22 00:06:25 omv kernel: [99245.846878] nfsd_dispatch+0xb1/0x240 [nfsd]
    May 22 00:06:25 omv kernel: [99245.846930] svc_process_common+0x3bf/0x780 [sunrpc]
    May 22 00:06:25 omv kernel: [99245.846969] svc_process+0xe9/0x100 [sunrpc]
    May 22 00:06:25 omv kernel: [99245.846991] nfsd+0xe3/0x150 [nfsd]
    May 22 00:06:25 omv kernel: [99245.846999] kthread+0xf8/0x130
    May 22 00:06:25 omv kernel: [99245.847021] ? nfsd_destroy+0x60/0x60 [nfsd]
    May 22 00:06:25 omv kernel: [99245.847026] ? kthread_create_worker_on_cpu+0x70/0x70
    May 22 00:06:25 omv kernel: [99245.847032] ret_from_fork+0x22/0x40
    May 22 00:06:25 omv kernel: [99245.847037] ---[ end trace b2717fa65f13ab36 ]---
    May 22 00:06:32 omv collectd[1236]: statvfs(/srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764) failed: Transport endpoint is not connected
    May 22 00:06:42 omv collectd[1236]: statvfs(/srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764) failed: Transport endpoint is not connected
    May 22 00:06:46 omv monit[1226]: 'filesystem_srv_dev-disk-by-id-usb-WD_Elements_25A1_575833314434383746543045-0-0-part1' space usage 88.9% matches resource limit [space usage>85.0%]
    May 22 00:06:46 omv monit[1226]: Device /srv/ecaf2ef9-aa68-47d9-99ad-ac21d64d8764 not found in /etc/mtab



    This leads to "transport endpoint is not connected" errors on the OMV server and clients attempting to read/write get input/output and/or stale file handle errors. If I reboot the OMV server service is restored for a while - but it's happened twice within 72 hours.


    I notice a similar problem on unraid's forums: https://forums.unraid.net/bug-…60-nfs-kernel-crash-r199/


    I don't know if it's the kernel/NFS, FUSE or mergerfs at fault here.


    How can I debug this? mergerfs is about the only thing built into OMV that I can find that suits my needs and I HAVE to be able to export it over NFS since it offers the best performance for my Kodi clients (and for other Linux software/services I have elsewhere on the network reading/writing to the pool)