ZFS Mirror Pool creation fails on LUKS-encrypted devices

    • OMV 4.x
    • ZFS Mirror Pool creation fails on LUKS-encrypted devices

      I have a really weird problem with setting up a ZPool consisting of 2-HDD-drives mirror.

      I'm working with:
      * openmediavault v4.1.12
      * linux kernel 4.18.6-1~bpo9+1
      * openmediavault-zfs v4.0.4 custom build (built using current master, with my "by-id" device discovery fix)
      * openmediavault-luks v4.0.3 custom build (using subzero79's upgrade from this thread)
      * two 10TB WD Red Pro drives

      So, my steps to create this pool were as follow:
      1. Two new LUKS devices have been created
      2. Both devices have been decrypted (and put into crypttab, but that's irrelevant)
      3. Both devices have been selected to be used in mirror configuration for the new zpool using "by-id" device mapping (on the list, they appear as "dm-1" and "dm-2", "dm-0" is another drive I have that I'm using in a "basic" zpool)
      The pool creation fails at "zpool create" command:

      Shell-Script

      1. $ zpool create -m "/zfs-pools/vault-main-hdd-mirrorpool" "vault-main-hdd-mirrorpool" mirror dm-name-sdb-crypt dm-name-sdc-crypt
      2. cannot create 'vault-main-hdd-mirrorpool': one or more devices is currently unavailable
      What's weird about this problem, is that right after that... the LUKS devices appear to be encrypted again.

      I've tried using both of these drives in "basic" pools, for both of them I was able to successfully create new pools, so the problem seems to be strictly related to mirrored vdevs.

      I've also tried to manually create the mirrored pool, and what I found out is that the creation fails only if I create the GPT label for these drives (which is what OMV-ZFS Plugin does automatically before invoking the "zpool create" command). If the GPT labels were not created, the pool creation command works just fine... I was able to use that pool to create a new volume, place files on that volume using shares etc...

      And now, the fun part - this problem is NOT reproducible on a Virtual Machine (or at least I was not able to reproduce it with my virtual setup). I've created a VM with a simple setup of OMV, upgraded it to the latest version, installed the same two plugins as my bare metal machine, used the same drives layout as my machine (so, one SATA drive for the OS, one NVMe drive for the first, basic, encrypted pool, and two SATA drives which are supposed to emulate my WD Red's), however, I've set them all to only 32GB of disk space (not really sure if that matters). Then, I've created the same LUKS devices, decrypted them, and... I was not able to reproduce the problem - the mirrored zpool was created without errors, both with and without GPT labels.

      As you can see, technically speaking, I do have a workaround for the problem already (the manual pool creating without partitioning devices first), but I'm trying to wrap my head around this problem, understand it, and maybe push a patch for the ZFS plugin. As far as I understand, it's not actually required to create GPT label before creating a pool, so maybe that's the way to go with this (however, I won't deny that I'm not a ZFS expert and I couldn't find a definitive answer for that question in ZOL's documentation).

      So, does anyone have any idea what the hell is going on in here, and what might be the cause of this problem? Honestly, I'm out of ideas what to debug next at this point, so any hint might be helpful.
    • I'll try to reproduce this, to this on a VM you need to overprovision the disk. Is safe just don't fill the disks with data.

      I think the issue is the GPT label creation after i commented out it started to work.

      The strange thing is that then i reverted the changes to i went to default master branch, destroy the pool and labels and creation worked this time.

      I found another bug, when deleting a pool, seems like there is a string being added to the device to clear, instead of doing

      zpool labelclear -f /dev/dm-1

      this comes

      zpool labelclear -f /dev/dm-11
      New wiki
      chat support at #openmediavault@freenode IRC | Spanish & English | GMT+10
      telegram.me/openmediavault broadcast channel
      openmediavault discord server
    • subzero79 wrote:

      ...
      zpool labelclear -f /dev/dm-1

      this comes

      zpool labelclear -f /dev/dm-11
      I've already reported it here: github.com/OpenMediaVault-Plug…nmediavault-zfs/issues/50
      The problem is not limited to DM devices only. The same thing happens when using eg. NVMe drives.

      The biggest problem is that getDevDisks assumes the ZFS label has to be removed from the first partition (at least that what's eg. "sda1" is).
      First, it's not always the case (with LUKS DM devices we were using the whole drive),
      second, the same naming scheme is not valid for other devices (eg. NVMe and apparently CCISS devices have a letter "p" before partition number).

      But I don't think this is related to the main problem - I think I've already re-encrypted (as in, created a new LUKS device, not just locked & unlocked) my HDD drives and the creation problem still occurred.
      As we've already both said, the problem has to do with the GPT partition scheme creation - somehow, it collides with "zpool create" (but it is still a mystery to me why zpool creation locked my devices).

      I'll try to create a new VM with overprovisioned drives and see if that makes any difference.

      The post was edited 1 time, last by dziekon ().

    • subzero79 wrote:

      Seems like the creation of gpt is not necessary whether is straight block device or device mapper.
      This plugins needs to be completely redone, it has been patched through the years to work.
      Removing the GPT creation from the RPC is pretty straightforward, so even though I agree that the current plugin's state is a mess, I could still quickly create a patch for that.

      However, are we 100% sure that this is the way to go? I really don't want to waste my time for a patch that won't be accepted because we didn't think of some edge case, or because it's not actually guaranteed to work that way.
    • dziekon wrote:

      However, are we 100% sure that this is the way to go?
      Not really. I would do a little bit more research. Maybe someone here works as sysadmin with zfs in production in solaris, bsd or linux can provide more insight. The original person that created the plugin backend (years ago) came from solaris environment, having seen him in a while.

      cc @miras
      New wiki
      chat support at #openmediavault@freenode IRC | Spanish & English | GMT+10
      telegram.me/openmediavault broadcast channel
      openmediavault discord server
    • dziekon wrote:

      However, are we 100% sure that this is the way to go?
      zfs on linux will create the gpt if it isn't there. We aren't changing older versions of the plugin where it might act different. So, I would say it can removed.
      omv 4.1.12 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.11
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • New

      Ok, it took me some time to prepare a fix for the GPT labeling problem. I took my time to prepare a more "future-proof" solution that also solves the devices referencing problems, and should be a solid foundation for future rework of the plugin. Feel free to review my code here: github.com/OpenMediaVault-Plug…penmediavault-zfs/pull/54

      So far, I've been able to confirm that the problem from the original post is now gone on my VM and real test machine. I was able to use HDD SATA drives, an NVMe drive and LUKS-encrypted devices without any problems in "basic" and "mirrored" configurations. I've also added a bunch of tests in the plugin's code itself, with more complex zpool setups.
    • New

      dziekon wrote:

      Feel free to review my code here
      That is an impressive amount of work. We appreciate all the help we can get :)
      omv 4.1.12 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.11
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • ZFS Mirror Pool creation fails on LUKS-encrypted devices

      New

      @dziekon

      Great, an additional developer for the zfs plugin is more than welcome. Thanks for the work you have done, even though I don’t have any problem at the moment. ;)

      Regards Hoppel
      ---------------------------------------------------------------------------------------------------------------
      frontend software - tvos | android tv | libreelec | win10 | kodi krypton
      frontend hardware - appletv 4k | nvidia shield tv | odroid c2 | yamaha rx-a1020 | quadral chromium style 5.1 | samsung le40-a789r2
      -------------------------------------------
      backend software - debian | openmediavault | latest backport kernel | zfs raid-z2 | docker | emby | unifi | vdr | tvheadend | fhem
      backend hardware - supermicro x11ssh-ctf | xeon E3-1240L-v5 | 64gb ecc | 8x10tb wd red | digital devices max s8
      ---------------------------------------------------------------------------------------------------------------------------------------

      Post by hoppel118 ().

      This post was deleted by ryecoaaron: dupe ().