Extremely weird UID/permission denied issue with MergerFS/NFSv4

  • Hello everyone. Been debugging this for 4 days and reached rock-bottom... This is my last resort. I'll show what's going wrong first, then what I tried, and lastly all the configs I can find.


    TLDR Issue:

    - UID 1000 and 12345 and have NO write permissions on a MergerFS shared folder (additionally exposed through NFS and SMB)

    - all other UID's have correct/expected permissions

    I have absolutely no clue how and why that happened.


    Commands that are faililng(/working):

    Code
    root@omv:~$ sudo --user=foo touch /export/media/foo # whyyyyyyyyyyyyyyyyyyy is it failing
    touch: cannot touch '/export/media/foo': Permission denied
    root@omv:~$ sudo --user=bar touch /export/media/bar && echo Success
    Success
    root@omv:~$ # if I were to create a user with UID 12345, that user would also fail to create any files here

    I want to note that changing the permissions of a user on a shared folder in the OMV webinterface resolves the issue, but it introduces an ACL entry that should NOT be needed according the POSIX permission rules! Both users are in the www-data group and the /export/media folder belongs to www-data:www-data with rwxrwsr-x permissions. Thus, user foo should be able to create a file.


    The issue does also not exist when I manually create a folder in the e.g. /root or even / and give it the same ownership & permission & ACLs. It's only /export/media.

    Configs

    Folder permissions and ownership of relevant folder

    UIDs, GIDs, and Group memberships of relevant users (foo and bar users created via OMV webinterface)

    Code
    root@omw:~$ id
    uid=0(root) gid=0(root) groups=0(root)
    root@omw:~$ id www-data
    uid=33(www-data) gid=33(www-data) groups=33(www-data)
    root@omw:~$ id foo
    uid=1000(foo) gid=100(users) groups=100(users),33(www-data)
    root@omw:~$ id bar
    uid=1001(bar) gid=100(users) groups=100(users),33(www-data)

    NFS exports (generated by OMV)

    Code
    root@omw:~$ cat /etc/exports
    # This file is auto-generated by openmediavault (https://www.openmediavault.org)
    # WARNING: Do not edit this file, your changes will get lost.
    
    # /etc/exports: the access control list for filesystems which may be exported
    #               to NFS clients.  See exports(5).
    /export/media 192.168.0.214(fsid=9b734414-6b87-4b2d-bc13-c1c1721600f3,rw,subtree_check,insecure)
    /export 192.168.0.214(ro,fsid=0,root_squash,no_subtree_check)

    FUSE conf

    Code
    root@omw:~$ cat /etc/fuse.conf
    # /etc/fuse.conf - Configuration file for Filesystem in Userspace (FUSE)
    
    # Set the maximum number of FUSE mounts allowed to non-root users.
    # The default is 1000.
    #mount_max = 1000
    
    # Allow non-root users to specify the allow_other or allow_root mount options.
    #user_allow_other

    MergerFS mount options

    Code
    root@omw:~$ mount | grep mergerfs
    mediastorage:a2c208ea-ee4c-49ee-8172-39a1431c6037 on /srv/mergerfs/mediastorage type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)
    mediastorage:a2c208ea-ee4c-49ee-8172-39a1431c6037 on /export/media type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)

    Subordinate UID/GID

    There is no SELinux or AppArmor configured/installed.

    idmapd.conf


    Permissions of /export/media in OMV

    ACL's of user foo and bar in OMV

    Permissions (NOT ACL) of user foo and bar in OMV

    NFS Share settings in OMV

    NFS settings in OMV

    SMB Settings (should not be relevant, since /export/media is only exposed via NFS)

    Dashboard of file systems

    System Information


    If you need more info, I'd be glad to provide it.

  • crashtest

    Approved the thread.
  • The only additional clue I have is that the issue is probably related to users that have been created in the past (/ UID's that have been used in the past).


    For example, I also created two other users with uid 100000 and 200000 (coming from an unprivileged container) and now, any user that I create with these UID's fails to create files in the aforementioned directory.


    What's weird is, it's not just any users that I create. I created the users with UID 1000, 100000, and 200000 a while back and now their UID's are "broken". However, I don't recall ever creating a user with 12345, yet that UID is also "broken".


    Freshly created users with differing UID's have no problem, their UID are not "broken" after they've been created. It also doesn't matter how I create the user. Either through the OMV interface, or through any permutation of groupadd+useradd commands with differing flags (-r, -M, -N, --uid, --gid, -d, -m, -s, ...).


    I also thought it might be related to the idmap daemon. I restarted that daemon, yet it still didn't resolve the issue.

  • Removing ACL's from /export/media using setfacl -b /export/media also does not solve the issue :/.

    I'm puzzled as to how POSIX permission should work and actually work...


    As evident from the outputs, even with all ACL's removed, the user foo with UID 1000 being part of the www-data group CANNOT create a file in the www-data:www-data directory with permissions rwxrwsr-x. Weirdly enough, chmod 0777 doesn't change the directory's permissions to 0777, but rather 2777 for whatever reason...


    The issue persists on nested folder manually created in /export/media

  • I think the upshot of the entire issue can be summarized by these lines...

    According to POSIX, this should happen:

    The property of a file indicating access permissions for a process related to the group identification of a process. A process is in the file group class of a file if the process is not in the file owner class and if the effective group ID or one of the supplementary group IDs of the process matches the group ID associated with the file. Other members of the class may be implementation-defined.

    This is precisely not what is happening with me :/

  • Rebooting the OMV virtual machine fixed the problem for UID 1000, 12345, and 200000. But I'm none the wiser...

    The issue stayed for UID 100000 and additionally, UID 100001 now also has the problem 🤡.


    Rebooting AGAIN fixed the issue for UID 100001 immediately, but for 100000 the issue STILL persists... 🤡


    Rebooting a THIRD time, while not deleting a test-user with UID 100000, fixed the issue with UID 100000 as well, finally.

    Now, at least from tests, all UIDs are "fixed" again... 🤡


    I hope this thread will help someone else.

    Nonetheless, if people smarter than me find out the issue, I'd be super glad to know what could've caused it.

    • Is it idmapd?
    • Cached files/permission in MergerFS/FUSE? (EDIT after @Krisbee's reply: well, it seems to be this one after all!)
    • Bug in MergerFS?
    • Bug in OMV?
    • Bug in NFS?
    • Cached files/permissions in NFS?
    • Bug in the Kernel (6.1.0-0.deb11.21-amd64)? (hopefully unlikely)?

    Edited once, last by cubernetes: Edit after @Krisbee's reply ().

  • Krisbee, I see your reply now.


    Very interesting that another leading zero strips the setgid, I was confused why 0775 did not suffice, thanks!


    In the end, I don't think it's an ACL issue, as stripping all ACL's with setfacl -b did not solve the issue, right?


    Skimming the second link you provided does seem to be an explanation for the behaviour I am seeing! I will read it in more detail! I missed while skimming the entire MergerFS documentation, should've been more thorough!


    Also not sure why I didn't find the github issue, well, at least now I know where I should look first and in more detail...


    Thanks a lot for that and the additional info!

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!