Snapraid plugin seems have some weird issues when testing it in virtualbox

  • Hi

    Have installed omv on my ny nas n100 motherboard and was thinking that testing the snapraid failure functionality (in virtualbox) before it happens in real life might be a good thing to do.


    Anyway hope you can shed some light under this issue, am a bit concered abuut getting stuck in some kind of gui error?


    It seems like there is some misconfugurtion of the UUID and it gets stuck(might be due to it is under virtualbox and something to do with UUID calculations?, have not had the time to test it under my esxi host).


    I am not that expirenced in linux so bear over with me if this is something obviusly ;)


    Test setup
    one os disk 0586bcd8-f392-43a5-bdff-5cc7195ab1c6 is the system disk sdc
    4 datadisks , 1 parity disk and 3 data drives, each with one file on them
    removed one drive, replaced it with another virtual empty drive (tried twice, second time added the replacement harddrive to another harddisk controller, same result)


    old drive had UUID dev-disk-by-uuid-22c2376f-926a-4e7b-8e3f-08d787b112b4

    Removing the virtual and adding an new drive (same size) and starting up again the disk is now empty (This drive had the 12b4-f3 file and it was gone.), but it seems to have taken the old UUID number?)



    stopped cronjob and removed the reference to the failed drive in mergerfs configuration (should not do anything but then it is done)



    snapraid check found one missing file, all ok.


    snapraid -fix --- found onw missing file and restored it -ok but the the old UUID - working?


    Loading state from /srv/dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954/snapraid.content...


    UUID change for disk 'sdd-D3' from '22c2376f-926a-4e7b-8e3f-08d787b112b4' to '0586bcd8-f392-43a5-bdff-5cc7195ab1c6' (..c6 is the systemdisk UUID??)
    Repaired the file (now accessible under the old drive UUID ..b4)


    ls for below UUID works: (-b4 ws empty of with file after the fix)
    dev-disk-by-uuid-22c2376f-926a-4e7b-8e3f-08d787b112b4 Data3 (new drive but with the samme UUID as the old?)

    dev-disk-by-uuid-4399e756-c8f0-47b4-b178-68cf948e981b mergerfs

    dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954 Data1
    dev-disk-by-uuid-89167632-2c41-48b5-ab04-712ac2d48059 snapraid.parity
    dev-disk-by-uuid-8a316de9-fbd4-4ff3-acdd-ed1eac677f0c Data2



    Anyhow below pictures issues in the gui after a snapraid failure test in omv.In gui it state the ...b4 as missing (but it works?)



    ..b4 drive is missing (as it should be, but it is still working with the new disk??)
    Also cannot cannot delete it or edit it - comes with an error), it never disappears , and i seem not able to fix/delete it



    Notice no ..b4 drive below anywhere, but it works in /srv/dev..

    (I also have noticed that the disk ids (sda-sda1 etc) can switch places after each reboot (do not keep their hardware order, problebly not an issue when the UUID names are used))

    root@omv-vm1:~# lsblk


    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

    sda 8:0 0 2G 0 disk

    └─sda1 8:1 0 2G 0 part /srv/dev-disk-by-uuid-89167632-2c41-48b5-ab04-712ac2d48059

    sdb 8:16 0 2G 0 disk

    └─sdb1 8:17 0 2G 0 part /srv/dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954

    sdc 8:32 0 20G 0 disk

    ├─sdc1 8:33 0 19G 0 part /

    ├─sdc2 8:34 0 1K 0 part

    └─sdc5 8:37 0 975M 0 part [SWAP]

    sdd 8:48 0 2G 0 disk

    └─sdd1 8:49 0 2G 0 part /srv/dev-disk-by-uuid-8a316de9-fbd4-4ff3-acdd-ed1eac677f0c

    sde 8:64 0 2G 0 disk

    └─sde1 8:65 0 2G 0 part /srv/dev-disk-by-uuid-4399e756-c8f0-47b4-b178-68cf948e981b

    sr0 11:0 1 1024M 0 rom


    Notice reference to the ..b4 drive that works, and the referance to an non existing ..c6

    root@omv-vm1:~# cat /etc/fstab

    # /etc/fstab: static file system information.

    #

    # Use 'blkid' to print the universally unique identifier for a

    # device; this may be used with UUID= as a more robust way to name devices

    # that works even if disks are added and removed. See fstab(5).

    #

    # systemd generates mount units based on this file, see systemd.mount(5).

    # Please run 'systemctl daemon-reload' after making changes here.

    #

    #

    # / was on /dev/sda1 during installation

    UUID=0586bcd8-f392-43a5-bdff-5cc7195ab1c6 / ext4 errors=remount-ro 0 1

    # swap was on /dev/sda5 during installation

    UUID=b5a5ab08-4ee6-4d09-b794-5848aa2955ba none swap sw 0 0

    /dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0

    # >>> [openmediavault]

    /dev/disk/by-uuid/89167632-2c41-48b5-ab04-712ac2d48059 /srv/dev-disk-by-uuid-89167632-2c41-48b5-ab04-712ac2d48059 ext4 defaults,nofail,user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl 0 2

    /dev/disk/by-uuid/44ec43a8-ba47-4adf-96e5-727a71b9a954 /srv/dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954 ext4 defaults,nofail,user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl 0 2

    /dev/disk/by-uuid/8a316de9-fbd4-4ff3-acdd-ed1eac677f0c /srv/dev-disk-by-uuid-8a316de9-fbd4-4ff3-acdd-ed1eac677f0c ext4 defaults,nofail,user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl 0 2

    /dev/disk/by-uuid/22c2376f-926a-4e7b-8e3f-08d787b112b4 /srv/dev-disk-by-uuid-22c2376f-926a-4e7b-8e3f-08d787b112b4 ext4 defaults,nofail,user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl 0 2

    /dev/disk/by-uuid/4399e756-c8f0-47b4-b178-68cf948e981b /srv/dev-disk-by-uuid-4399e756-c8f0-47b4-b178-68cf948e981b ext4 defaults,nofail,user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl 0 2

    # <<< [openmediavault]



    root@omv-vm1:~# cat /etc/snapraid.conf

    # This file is auto-generated by openmediavault (https://www.openmediavault.org)

    # WARNING: Do not edit this file, your changes will get lost.

    autosave 0

    #####################################################################

    # OMV-Name: sdb-D1 Drive Label:

    content /srv/dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954/snapraid.content

    disk sdb-D1 /srv/dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954

    #####################################################################

    # OMV-Name: sda-parity Drive Label:

    parity /srv/dev-disk-by-uuid-89167632-2c41-48b5-ab04-712ac2d48059/snapraid.parity

    #####################################################################

    # OMV-Name: sdc-D2 Drive Label:

    content /srv/dev-disk-by-uuid-8a316de9-fbd4-4ff3-acdd-ed1eac677f0c/snapraid.content

    disk sdc-D2 /srv/dev-disk-by-uuid-8a316de9-fbd4-4ff3-acdd-ed1eac677f0c

    #####################################################################

    # OMV-Name: sdd-D3 Drive Label:

    content /srv/dev-disk-by-uuid-22c2376f-926a-4e7b-8e3f-08d787b112b4/snapraid.content

    disk sdd-D3 /srv/dev-disk-by-uuid-22c2376f-926a-4e7b-8e3f-08d787b112b4

    exclude *.unrecoverable

    exclude lost+found/

    exclude aquota.user

    exclude aquota.group

    exclude /tmp/

    exclude .content

    exclude *.bak

    exclude /snapraid.conf*


    ..b4 disk is here

    root@omv-vm1:/srv# ls -l

    total 36

    drwxrwxrwx 2 root root 4096 Feb 18 22:28 dev-disk-by-uuid-22c2376f-926a-4e7b-8e3f-08d787b112b4

    drwxr-xr-x 3 root root 4096 Feb 18 20:30 dev-disk-by-uuid-4399e756-c8f0-47b4-b178-68cf948e981b

    drwxr-xr-x 3 root root 4096 Feb 18 22:28 dev-disk-by-uuid-44ec43a8-ba47-4adf-96e5-727a71b9a954

    drwxr-xr-x 3 root root 4096 Feb 18 20:30 dev-disk-by-uuid-89167632-2c41-48b5-ab04-712ac2d48059

    drwxr-xr-x 3 root root 4096 Feb 18 22:28 dev-disk-by-uuid-8a316de9-fbd4-4ff3-acdd-ed1eac677f0c

    -rw-r--r-- 1 root root 3569 Feb 18 20:47 fix.log

    drwxrwxrwx 3 root root 4096 Feb 18 19:06 mergerfs

    drwxr-xr-x 3 root root 4096 Feb 18 14:29 pillar

    drwxr-xr-x 7 root root 4096 Feb 18 14:29 salt


    ..c6 is also here, where is the ..b4 disk?? and what is the ..ba disk, the new disk uuid?
    root@omv-vm1:/srv# ls -l /dev/disk/by-uuid/

    total 0

    lrwxrwxrwx 1 root root 10 Feb 18 22:20 0586bcd8-f392-43a5-bdff-5cc7195ab1c6 -> ../../sdc1

    lrwxrwxrwx 1 root root 10 Feb 18 22:20 4399e756-c8f0-47b4-b178-68cf948e981b -> ../../sde1

    lrwxrwxrwx 1 root root 10 Feb 18 22:20 44ec43a8-ba47-4adf-96e5-727a71b9a954 -> ../../sdb1

    lrwxrwxrwx 1 root root 10 Feb 18 22:20 89167632-2c41-48b5-ab04-712ac2d48059 -> ../../sda1

    lrwxrwxrwx 1 root root 10 Feb 18 22:20 8a316de9-fbd4-4ff3-acdd-ed1eac677f0c -> ../../sdd1

    lrwxrwxrwx 1 root root 10 Feb 18 22:20 b5a5ab08-4ee6-4d09-b794-5848aa2955ba -> ../../sdc5

    • Official Post

    Have you read the doc's for Mergerfs and SnapRAID? There's a section at the end of the SnapRAID document that describes failed drive recovery AND (after restoring a new drive with SnapRAID) removing the old drive entry and adding the restored drive to a MergerFS pool. Give this drive recovery method a try. (The process order will allow for the removal of the "missing" drive at the end.)
    _________________________________________________________________________

    Note that Virtual Box has limitations when it comes to removing and adding virtual hard drives. When a drive is removed, drive reordering in VB is normal behavior. Unfortunately, the same is true of real hardware. Linux assigns drive device names in the order BIOS / UEFI presents them to the Kernel. This is why OMV relies on drive UUID's.

    In VB, it's possible to remove a harddrive and bring it back in the exact same condition (with the original file system, UUID, etc.). In the real world there's very few reasons to physically remove a healthy drive and bring it back, and no reason to run a restoration of a healthy drive in SnapRAID. When a drive is removed in VB, the file that represents the "failed" virtual drive should be deleted (or, at the minimum, not added back to the virtual machine).


    Lastly, if a missing drive is "referenced", in OMV's Filesystems window, it can not be unmounted and deleted. When a drive is "referenced" that means something is still configured that is attempting to use the drive in some way. (A shared folder, a monitoring process, etc.) The configuration item, that is attached to the drive, must be found and deleted / removed.


    In your case, if the missing drive UUID is still configured in Mergerfs, you won't be able to unmount and removed the drive from filesystems. When the drive no longer has the "referenced" check mark in the filesystems window, the missing drive can be unmounted / deleted.

  • Hi


    Thanks for the reply, yes it might be some kind of weird issue in Virtualbox (or i did something wrong, but have used the same image and cannot find anything looking wrong).


    Anyhow, i did success in replacing the drive if both drives where connected at the same time (one in use and the other just connected for later replacement(now referred as replacement)) I then failed the drive and used the replacement drive to recover the files. I did also manage to remove the filesystem missing error from the gui. (all tests done in virtual box)


    I document the process below.


    OMV Failed Disk Replacement procedure

    1. Disable crontab for snapraid, (just in case, we do not want something to happen while working on fixing it. This can be done in omv gui scheduled tasks tab.
    2. ssh to omv and cd /srv (just to be in a good location)
    3. add physical replacement disk (and remove old) , or if possible (disk still working and files are intact/not corrupted) copy all the content from the failed disk to a new disk, then snapraid fix is not needed) (could also be an temporary location for safeguards in case something goes wrong - requires enough disk capacity)
    4, format the REPLACEMENT disk (omv gui)
    5. create filesystem for REPLACEMENT disk (omv gui)
    6. validate/identity using lsblk the REPLACEMENT disk is there with an unique UUID, note down the UUID
    7. change the failed drive uuid with the REPLACEMENT drive UUID in /etc/snapraid.conf also note the snapraid shortname for the disk
    8. snapraid -d FailedDiskFriveShortNameInSnapraidconfig -l fix.log fix . example snapraid -d sdd-D3 -l fix.log fix
    9. validate that files are location on the REPLACEMENT (and data is valide/not corrupted)
    10. ls /srv/.....UUID
    11. mergerfs-
    a.remove failed drive UUID in omv gui mergerfs config
    b.-add new drive UUID in omv gui mergerfs config
    c. -restart mergerfs filesystem in omv gui

    12. validate the files are there in mergerfs

    13. Do the same replacement disk UUID mapping in the snapraid gui, the disk will(in my case) be blanked out even if it is configured in the snapraid.config



    14. remove the failed disk UUID from fstab (# it out)

    15. Reboot

    16. In omv gui (or shell) umount the failed disk file system

    17 Remove the failed disk filesystem in omv gui filesystem , and now it should be removed

    18. so a snapraid check, diff validate that it seems correct (no missing snapraid disk error)

    19 Do a snapraid sync

    20 enable snapraid sync in crontab - Can be done in omv gui scheduled tasks


    Best Regards

    /JR

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!