MergerFS pool becomes corrupted for an unknown reason

    • OMV 4.x
    • MergerFS pool becomes corrupted for an unknown reason

      I've made a MergerFS pool that spans the entire size of 8 drives. In this pool is a single parent folder called "Share," under which all my storage and media files reside in sub-folders. This "Share" folder is mapped as an SMB share, and it is also mapped to a Docker storage container to which my other Docker containers have access.

      I did a fresh install of OMV 4.x about a month ago and recreated this setup. After about a week, I noticed that my NZBGet container was giving Unrar errors indicating that there was not enough free space to unpack downloads (Unrar error 5). Then, ruTorrent was giving hash errors, that when a new file was downloaded, the subsequent hash check was returning missing pieces (Hash check on download completion found bad chunks, consider using \"safe_sync\"). Eventually, I found that by deleting the MergerFS pool and recreating a new one resolved both problems. So, I did that, updated the mappings for Docker, and all was right in the world again. This lasted about two weeks, and once again, overnight, my NZBGet and ruTorrent downloads are failing with the same issues.

      I don't want to continue this cycle of having to delete and re-create a new pool every few weeks. I don't know if there's something that triggers this corruption. Any ideas on how to identify the problem? I have run fsck on all drives with no filesystem errors to be found thus far.
    • trapexit wrote:

      The recreation "fixing" the issue doesn't make sense. mergerfs doesn't interact with your data. It's a proxy / overlay.

      Could you provide the settings you're using? It's really not possible to comment further without such details.

      Are you using the drives out of band of mergerfs? Are drives filling?

      Thanks for chiming in. It doesn't make sense to me, either, but I'm not sure what else could be the cause.

      The drives are used ONLY in context of the pool, nothing writes data to any of the individual drives. At this point, I am unable to download any new data to ensure that the pool is still filling per policy, but up to this point, yes, all data was written to the drives as I would expect based on the policy I had selected:

    • Unfortunately, there is nothing I can do without additional information. If it's saying you're out of space then something is returning that. The only time mergerfs explicitly returns ENOSPC is when all drives become filtered and at least one reason was minfreespace.

      Next time an error occurs please gather the information as mentioned in the docs or at least `df -h`.
    • trapexit wrote:

      Unfortunately, there is nothing I can do without additional information. If it's saying you're out of space then something is returning that. The only time mergerfs explicitly returns ENOSPC is when all drives become filtered and at least one reason was minfreespace.

      Next time an error occurs please gather the information as mentioned in the docs or at least `df -h`.

      I must be overlooking something because I can't seem to find instructions for capturing info in the mergerfs docs.

      Edit: For some reason, it appears as though data has begun accumulating on disk a4 exclusively instead of spreading across the disks as intended.

      Here is the output from df -h

      Source Code

      1. Filesystem Size Used Avail Use% Mounted on
      2. udev 3.9G 0 3.9G 0% /dev
      3. tmpfs 787M 29M 759M 4% /run
      4. /dev/sdi1 108G 8.0G 94G 8% /
      5. tmpfs 3.9G 0 3.9G 0% /dev/shm
      6. tmpfs 5.0M 0 5.0M 0% /run/lock
      7. tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
      8. tmpfs 3.9G 0 3.9G 0% /tmp
      9. b3:a1:a2:a3:a4:b4:b1:b2 53T 44T 9.0T 83% /srv/2e119a19-7664-4aee-bade-aa172f81a65d
      10. /dev/sdj1 220G 15G 205G 7% /srv/dev-disk-by-label-ssd
      11. /dev/sde1 3.6T 3.0T 667G 82% /srv/dev-disk-by-label-a2
      12. /dev/sda1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-b1
      13. /dev/sdc1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-a1
      14. /dev/sdb1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-b3
      15. /dev/sdg1 5.5T 4.5T 967G 83% /srv/dev-disk-by-label-b4
      16. /dev/sdd1 7.3T 6.2T 1.1T 86% /srv/dev-disk-by-label-a4
      17. /dev/sdf1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-b2
      18. /dev/sdh1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-a3
      19. /dev/sdk1 9.1T 6.1T 3.0T 67% /srv/dev-disk-by-label-a0
      20. /dev/sdl1 9.1T 6.1T 3.0T 67% /srv/dev-disk-by-label-b0
      21. folder2ram 3.9G 27M 3.9G 1% /var/log
      22. folder2ram 3.9G 0 3.9G 0% /var/tmp
      23. folder2ram 3.9G 1.2M 3.9G 1% /var/lib/openmediavault/rrd
      24. folder2ram 3.9G 1.4M 3.9G 1% /var/spool
      25. folder2ram 3.9G 14M 3.9G 1% /var/lib/rrdcached
      26. folder2ram 3.9G 16K 3.9G 1% /var/lib/monit
      27. folder2ram 3.9G 4.0K 3.9G 1% /var/lib/php
      28. folder2ram 3.9G 0 3.9G 0% /var/lib/netatalk/CNID
      29. folder2ram 3.9G 420K 3.9G 1% /var/cache/samba
      30. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/68099a0c5263c839e83ab0c11e41f16d17b9d201d6cf11701f2c23ab6e6227db/merged
      31. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/3002c53e239177ad4763ef146daf2d210fb33653d0c234aeb7562178ddd37f22/merged
      32. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/cfb106e20749a59868102ff3e5b305282ae7d3c4290fea6b9b8e70bc36ebf9b3/merged
      33. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/e3b3ab963f27de18bad0de89add2a24d98734fb34b3306a744562920b56dbbbb/merged
      34. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/8ea79bc4014b12acdacb74db01a06130e9712c3076fcaabb69e48fecbfe0f3a1/merged
      35. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/4af58910ec5218bb0f673cec41b6dce829faa12750d2a5faccd93bb0cf8f8c0d/merged
      36. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/5d91da61359ef9e3a4a2e6c97e164870893d7760fe4d19aaf20043af376f3b61/merged
      37. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/487d99959ec09c63add4ec7c7639defaff8beb6668e7ef57db1c5bc22db4215f/merged
      38. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/94ab641e2a014280547be32e57e1052d82da51b3c0ef567fdf68db51acfbeb8b/merged
      39. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/ade951dd0e8f5b7119cccddaac734b2a08db609474ad77b99e8d05f3a0b36edb/merged
      40. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/510e6be772064fd62df2a5620ee3ddf7a415da2587fc2f125276a715459a41c7/merged
      41. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/6a2740c45f0b77df46d3ff666a1311ae1e53bae714c6c9d9b43e5d5b3ee8ea36/merged
      42. overlay 108G 8.0G 94G 8% /var/lib/docker/overlay2/971f4fc964949623ec54ab137d84dd61882f866a3cac29614813aba8da59da3b/merged
      43. shm 64M 0 64M 0% /var/lib/docker/containers/50c005696bc78bc6acfd3bd29985d5e50fc0a581ad1af461ec7fdc2efe7c3720/mounts/shm
      44. shm 64M 8.0K 64M 1% /var/lib/docker/containers/a93171a356e68b6b1243ac7eb2c71e7ad5a6dd6b32bae16699cd9d96662f1ab5/mounts/shm
      45. shm 64M 0 64M 0% /var/lib/docker/containers/23654ac3d6abab9b53003b0aedd995d1c4f5bf2d952196dbc1a79645fc2a7631/mounts/shm
      46. shm 64M 4.0K 64M 1% /var/lib/docker/containers/3a947eb809954e56fd7389060b06a53661872b237b0688d50effb9c310b01aa4/mounts/shm
      47. shm 64M 0 64M 0% /var/lib/docker/containers/375fcb7c1f6cafd39dc23655ad557b6adb64d1f9605adc517be8ba0a796b9324/mounts/shm
      48. shm 64M 0 64M 0% /var/lib/docker/containers/9ab985236f39303f95e64a8cfa3c15e5b804b236bc5bb227f43a96b3406f1f39/mounts/shm
      49. shm 64M 0 64M 0% /var/lib/docker/containers/57129d214670c370355f2f82531b745105463deee86933c16c53bb0e9b94faa4/mounts/shm
      50. shm 64M 4.0K 64M 1% /var/lib/docker/containers/cdab31646db3d467e1ca5506eda210109bbb50e9b8fee08e7b7d40a5eac77877/mounts/shm
      51. shm 64M 0 64M 0% /var/lib/docker/containers/e31f357fa3738607b350a43cf9863d0e91afcbbe1b6092dce0f7e898d4709491/mounts/shm
      52. shm 64M 0 64M 0% /var/lib/docker/containers/a2d86ff87e3cac7ab7472990dd15ce8666e657160d92d2c786fc6e30bdb8101b/mounts/shm
      53. shm 64M 0 64M 0% /var/lib/docker/containers/1011c988d4f0f7369214f3d8072a95940ee3e1cd07010db33bc6d84bac043ada/mounts/shm
      54. shm 64M 0 64M 0% /var/lib/docker/containers/cc7f72e186bc456ce026124f34d44f4b87a4a995cca2909b6ef18ef57de189da/mounts/shm
      55. shm 64M 0 64M 0% /var/lib/docker/containers/623cdd76ef35c243c1758d12863577a203b6edd5e59b5b10aa40480bd12253be/mounts/shm
      Display All

      The post was edited 2 times, last by flvinny521 ().

    • > I can't seem to find instructions for capturing info in the mergerfs docs

      github.com/trapexit/mergerfs#support
      github.com/trapexit/mergerfs#example


      > it appears as though data has begun accumulating on disk a4

      The drives seem to have similar % usage. Regardless... if it's going to one drive then it's probably due to your config (path preservation or a policy which is targeting that branch).
    • trapexit wrote:

      > I can't seem to find instructions for capturing info in the mergerfs docs

      github.com/trapexit/mergerfs#support
      github.com/trapexit/mergerfs#example


      > it appears as though data has begun accumulating on disk a4

      The drives seem to have similar % usage. Regardless... if it's going to one drive then it's probably due to your config (path preservation or a policy which is targeting that branch).

      I just deleted and re-created a new mergerfs pool again, and immediately after mapping everything to the new pool, data is able to be downloaded/unpacked via nzbget and hash checks no longer fail in rtorrent/rutorrent. The strange part is that it seems like the data was able to be downloaded, it was just the unpacking/verification stages that were failing.

      I have no doubt that there is some issue external to mergerfs that is causing this behavior, I just don't know where to begin.
    • Thank you @trapexit. I didn't want to waste your time pouring through my data if I found another culprit. However, so far, I've been unable to make any progress, so here we are:

      I am not sure how to run strace on a Docker container that's running and resulting in these errors. Any simple command I could run in terminal that could be traced instead?

      Source Code

      1. mergerfs -v
      2. mergerfs version: 2.25.1
      3. FUSE library version: 2.9.8-mergerfs
      4. fusermount version: 2.9.7
      5. using FUSE kernel interface version 7.19

      Source Code

      1. uname -a
      2. Linux ratsvault 4.19.0-0.bpo.1-amd64 #1 SMP Debian 4.19.12-1~bpo9+1 (2018-12-30) x86_64 GNU/Linux

      Source Code

      1. Here is the mntent section in OMV's config.xml
      2. <mntent>
      3. <uuid>fd292da7-e8e2-4f2f-b22d-c0ea72b51431</uuid>
      4. <fsname>ceb94f6f-2407-4c37-9eb3-a737c3af08cf</fsname>
      5. <dir>/srv/ceb94f6f-2407-4c37-9eb3-a737c3af08cf</dir>
      6. <type>fuse.mergerfs</type>
      7. <opts></opts>
      8. <freq>0</freq>
      9. <passno>0</passno>
      10. <hidden>1</hidden>
      11. </mntent>
      Display All

      Source Code

      1. And the entry in fstab...
      2. /srv/dev-disk-by-label-b1:/srv/dev-disk-by-label-b2:/srv/dev-disk-by-label-b3:/srv/dev-disk-by-label-b4:/srv/dev-disk-by-label-a1:/srv/dev-disk-by-label-a2:/srv/dev-disk-by-label-a3:/srv/dev-disk-by-label-a4 /srv/ceb94f6f-2407-4c37-9eb3-a737c3af08cf fuse.mergerfs defaults,allow_other,use_ino,dropcacheonclose=true,category.create=mfs,minfreespace=20G 0 0

      The post was edited 3 times, last by flvinny521 ().

    • gderf wrote:

      Your post of the fstab entry is truncated. Don't use nano for this, try cat instead

      Whoops, edited. And this wouldn't fit in my previous post.


      Source Code

      1. df -h
      2. Filesystem Size Used Avail Use% Mounted on
      3. udev 3.9G 0 3.9G 0% /dev
      4. tmpfs 787M 56M 731M 8% /run
      5. /dev/sdi1 108G 8.2G 94G 9% /
      6. tmpfs 3.9G 0 3.9G 0% /dev/shm
      7. tmpfs 5.0M 0 5.0M 0% /run/lock
      8. tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
      9. tmpfs 3.9G 0 3.9G 0% /tmp
      10. b1:b2:b3:b4:a1:a2:a3:a4 53T 44T 8.9T 84% /srv/ceb94f6f-2407-4c37-9eb3-a737c3af08cf
      11. /dev/sdj1 220G 117G 103G 54% /srv/dev-disk-by-label-ssd
      12. /dev/sdf1 3.6T 3.0T 668G 82% /srv/dev-disk-by-label-a2
      13. /dev/sdd1 5.5T 4.5T 967G 83% /srv/dev-disk-by-label-b4
      14. /dev/sda1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-b1
      15. /dev/sdh1 7.3T 6.2T 1.1T 86% /srv/dev-disk-by-label-a4
      16. /dev/sdc1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-b3
      17. /dev/sdb1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-b2
      18. /dev/sde1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-a1
      19. /dev/sdg1 7.3T 6.0T 1.3T 83% /srv/dev-disk-by-label-a3
      20. /dev/sdk1 9.1T 6.1T 3.0T 67% /srv/dev-disk-by-label-a0
      21. /dev/sdl1 9.1T 6.1T 3.0T 67% /srv/dev-disk-by-label-b0
      22. folder2ram 3.9G 26M 3.9G 1% /var/log
      23. folder2ram 3.9G 0 3.9G 0% /var/tmp
      24. folder2ram 3.9G 1.2M 3.9G 1% /var/lib/openmediavault/rrd
      25. folder2ram 3.9G 1.4M 3.9G 1% /var/spool
      26. folder2ram 3.9G 14M 3.9G 1% /var/lib/rrdcached
      27. folder2ram 3.9G 16K 3.9G 1% /var/lib/monit
      28. folder2ram 3.9G 4.0K 3.9G 1% /var/lib/php
      29. folder2ram 3.9G 0 3.9G 0% /var/lib/netatalk/CNID
      30. folder2ram 3.9G 420K 3.9G 1% /var/cache/samba
      31. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/ade951dd0e8f5b7119cccddaac734b2a08db609474ad77b99e8d05f3a0b36edb/merged
      32. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/487d99959ec09c63add4ec7c7639defaff8beb6668e7ef57db1c5bc22db4215f/merged
      33. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/8ea79bc4014b12acdacb74db01a06130e9712c3076fcaabb69e48fecbfe0f3a1/merged
      34. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/510e6be772064fd62df2a5620ee3ddf7a415da2587fc2f125276a715459a41c7/merged
      35. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/cfb106e20749a59868102ff3e5b305282ae7d3c4290fea6b9b8e70bc36ebf9b3/merged
      36. shm 64M 0 64M 0% /var/lib/docker/containers/50c005696bc78bc6acfd3bd29985d5e50fc0a581ad1af461ec7fdc2efe7c3720/mounts/shm
      37. shm 64M 0 64M 0% /var/lib/docker/containers/375fcb7c1f6cafd39dc23655ad557b6adb64d1f9605adc517be8ba0a796b9324/mounts/shm
      38. shm 64M 0 64M 0% /var/lib/docker/containers/a2d86ff87e3cac7ab7472990dd15ce8666e657160d92d2c786fc6e30bdb8101b/mounts/shm
      39. shm 64M 0 64M 0% /var/lib/docker/containers/e31f357fa3738607b350a43cf9863d0e91afcbbe1b6092dce0f7e898d4709491/mounts/shm
      40. shm 64M 0 64M 0% /var/lib/docker/containers/57129d214670c370355f2f82531b745105463deee86933c16c53bb0e9b94faa4/mounts/shm
      41. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/014f7b14e32e7c552a2e435388ad3d4a8989eb9fb759abe9c4688d2e886aad2b/merged
      42. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/b56f93e33fd23ece3c5f8d89962b787c1c0ea137e58b48968778ddc62c9962b6/merged
      43. shm 64M 0 64M 0% /var/lib/docker/containers/4d716ff44ec7dc01f7c066b1e006fc55f206c6715ed02b7ac1d33263aa0c70fc/mounts/shm
      44. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/526e0cd6461eed89758eea69b3770ea2b2eaa554093109e89e3e7715cf4ee901/merged
      45. shm 64M 4.0K 64M 1% /var/lib/docker/containers/5718d146e23e0f4c2e33bbb2b1c521c5f83ffc511566ed8480b5770e635662d3/mounts/shm
      46. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/68ee4c513ddfe951547b6e1c47c0b91892ae75e0f8f4965aa472edb8b0c4d1f3/merged
      47. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/c3fff96d65bdd15a24ebea982f39103f0d9284258aad5adc142582eee7099322/merged
      48. shm 64M 0 64M 0% /var/lib/docker/containers/901d6d631ea9e696d4240c569e04e969921f0829e3bb817b2fdc886d7d645918/mounts/shm
      49. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/40d21743aca65595f31093efa78dd0a070a37cf4347dd5fdadb3828078c0549f/merged
      50. shm 64M 0 64M 0% /var/lib/docker/containers/0e4eb40f7911cb0a88a849ff09091f80b49948f3af2454474bbcc15bfda105dd/mounts/shm
      51. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/3db9692b7bda1e69ed9f515929ffeb841d82c6d40a2524552d3fc4fefffa5995/merged
      52. shm 64M 4.0K 64M 1% /var/lib/docker/containers/0b3d306eac3dc115dc2d6174e49b5b8eaa324e97cf59f785d3b99002d19eaaf4/mounts/shm
      53. overlay 108G 8.2G 94G 9% /var/lib/docker/overlay2/10e5a635c8cec040fa2ffb921708587688ac1a78425a9128e7b3699cdac5c3bb/merged
      54. shm 64M 0 64M 0% /var/lib/docker/containers/cc8911ac1405d6c77522e5c5149f6115d3152b3e02b858776baf35f5e0ab31e1/mounts/shm
      55. shm 64M 0 64M 0% /var/lib/docker/containers/526d4a276ac6128a206da8f87b6a5c16caec4729e7c0d2d0228dc6dd746a2672/mounts/shm
      56. shm 64M 8.0K 64M 1% /var/lib/docker/containers/56acf421bf8688ef8ddbeee17dc08741eab729865fb0b4cbed5b5f87af425e0c/mounts/shm
      Display All
    • @trapexit I played around in the terminal trying to figure out a command that would cause an error and had no luck. Moves, copies, unpacking, etc. all worked fine, and the issues continued to be experienced only within Docker containers.

      For the containers that were affected, I removed the mapping of a storage container and mapped the path instead, and it seems the problem may be resolved (at least for now, I have been downloading new media for a couple of days without issue). So either the Docker plugin isn't playing nice with MergerFS, or vice versa.

      This works:


      This doesn't:

    • Well, I spoke to soon, because NZBGet has begun reporting the "out of space" errors again. Will have to do some more digging.

      Edit: Upon more research, disk quotas seem to be the issue. I have never established any quotas since installing OMV, but I noticed NZBGet throwing an error regarding exceeding disk quotas. So, I disabled them using sudo quotaoff -a and re-ran a few downloads that had immediately before been unable to unpack. This time, all three separate downloads unpacked successfully. Is there a more permanent way to ensure that quotas do not get enabled automatically on a reboot or other system event? Can I simply remove the quota arguments in the mntent sections of the config.xml file?

      The post was edited 1 time, last by flvinny521 ().