BTRFS volume with multiple subvolumes - how to setup shared folders

c.monty · 28. Mai 2016

Hi!

I have a single disk formatted with BTRFS.
I'm not using GPT, instead I have created several subvolumes, e.g. @pictures, @downloads, ...

The question is now, what is the optimal configuration for defining shared folders?

Normally I would mount the subvolume @pictures to a single directory, e.g. /home/pictures. But this is not feasible in the WebUI.
In the WebUI I can select the device + the directory @pictures representing the relevant subvolume @pictures.

Any advice?

THX

indigo · 28. Mai 2016

Some time ago, i'll asked a similar question about the BTRFS features. As far i'm know OMV has only the stadart features implemented. The ohter features are requested and still in process.

Maybe some one of the Mod's can give us an update what's the actually state?

By the way is somebody using BTRFS and in what kind of config.

@c.monty i'll hope you're ok with my question in your thread?

vl1969 · 28. Mai 2016

Hi everybody,
I thought I guve a stab of this.

c.monty, if I were you I would rethink your btrfs usage.
I never use btrfs on raw devices. Even in version 4.0 there is a bug somewhere in the code that can lead to full data loss if device fails, a specially when using btrfs raid and disk pooling features. Had lost 2T of data like that. Now I usually partition disks and drop btrfs on partitions. Works exactly the same but at least I have a chance of recovery and raid works as expected.

Now, I assume you are running a backport kernel since normal omv does not support btrfs.
If so, than all feature of btrfs available to you but you need to do all on cli. But for what you want it probably can be done via webui. It works just as you want and same as with any other folder. Btrfs subvolume looks and behave just like any normal folder on file system except it has some special features. But this features are only accessible in cli so we can ignore them fir now.
Fist mount the device in omv.filesystems. you should be seeing it as available there.
This will give you a uuid mount point in /media.
Omv mounts everything in /media using uuid. As your.volume already up and formated it have uuid.
Now once you mount your volume, you can either do a little fancy setup by creating your own folder somewhere and drop simlink to the mounted vol. To make a friendly and short name for it, or just go on and creatr your shares straight from the mount. Just like you described.
Go omv.samba or omv.nfs and create a share using mounted vol as device and your subvol as folder. It is exactly the same as you did before except the mount point is in different place. Just to add a little point of contention here but why use subvolume at all. As it stands now ,except for some fancy new features that mostly difficult to use as there is no ui for it, it is basically a normal folder in the fs. I stopped using then some time ago.
I just build out a new server setup on omv 2.2.4 back kernel 3.16 specifically because I had several btrfs raid 1 pools with data.

I just finish moving all data from my raid1 pools into new snapraid+mergefs volume and reusing the old disks by adding them to the new pool.
AFAC all my disks in snap raid are btrfs volumes.

Sent from my phone

c.monty · 29. Mai 2016

Hello!

Can you please share the documentation of the bug with BTRFS on raw devices you are talking of?

My understanding of a subvolume is that it appears as a folder in the "root" volume.
However, I can use several mount options incl. compression which gives a difference to writing data into the folders directly.
Can you confirm?

What are these "little fancy setup" things you are talking about? Could you please specify?

THX

vl1969 · 29. Mai 2016

Well I am by no means an expert, and I have no documentation on the bug. All I have is a bad experience and some data loss. That I could replicate using vm.
I am not sure what is wrong with btrfs and raw devices, except that if you build raid1 for example using raw devices, ie. Not gpt or mbr, and one one of the disks fails, you can loose the while pool along with all the data on it. It does not happen all the time, but I was able to replicate it 5 times out of 10 times I trierd. It seams that you most likely to loose data if the first drive in the pool fails. By first drive I mean the drive you use first when creating the pool.
You know, when you give commnad mkfs.btrfs sda, sdb, sdc. Etc.
Well in my experience if any drive but sda fails on you, the data might still be available on degraded raid pool , but if sda dails, 5 out of 10 times the whole pool is gone and any data recovery software can mot even see any data. It like you are looking on a brand new disk, just off the factory.
When you partition the disk first and use partitions like sda1, sdb1, sdc1 to build the pool this does not happen. I purposely corupted my vm drives and was still able to mount pool degraded and copy data off it. And recovery software was able to see the data and recover it as well. Granted there are times when nothing helps, but that is rare.

You are right, subvolumes are basically folders with special features like snapshots, ability to mount like a separate file system. But essentially it is just a folder. I have moved from using any of fancy btrfs features to just use it as any other fs. I still get the cow, and but rot protection and not going crasy whith special stuff.
I have just build out an new omv setup, 2x120gb ssd in raid1 for system , 2x3tb disks, 4x2tb using snapraid and mergerfs.
All disks are partitioned with gpt and btrfs. Snapraded with 2 parity drives, and merged insingle volume pool.
Best part, I can pull any data drive at any time and read it on any pc capable reading btrfs.
I run sync via crone every night.
So far do good. Looks good.
Next step is build out couple of vms for media streaming and recording and I will be golden

Sent from my phone

1activegeek · 7. Dezember 2017

Sorry to revive an old thread, but I've been doing a lot of looking around at ideas lately on how to design/setup my system and this thread provided value.

Currently I'm running fairly solidly with 4x4TB disks formatted XFS and using SnapRAID/MergerFS. I'm converting over to Encrypted disks, and will have 4x2TB and 4x4TB disks after the conversion. What I'm trying to understand though, is whether it makes sense to forego XFS and convert into the BTRFS format now. I've been reading and seems it is getting much praise as a more future enabled FS. But on this forum there seems to be a lot of talk of it not being stable enough. Which is surprising as the general consensus I've seen in other places is BTRFS is stable and is the new "hot" thing.

Anyhow - to the point - what I'm curious @vl1969 - do you see BTRFS as being stable enough to move to as my core FS? You mentioned it here I think and it sounds to have been working pretty solidly. If so, do I still need to drop down to the command line to truly create the BTRFS file system with multiple disks? What I've seen in current form is that if I just format the disk for BTRFS and mount the multiple disks (testing with the 4x2TB now) - dropping to CLI and running the btrfs filesystem show I only end up with an output of each disk being it's own and no RAID1 even it would appear. With the amount of disk I have, should there be an issue trying to run it RAID10?

Last question around this - what are the base options included by OMV? You mentioned I think bit rot and "cow"? Any others? I'd like to do it all through the GUI so I make sure I don't screw anything up. And I believe doing it this way I avoid the issue you spoke of where that bug can appear when laying it on top of the raw disk vs the partitioned disk. Forgive me if some of this is more in the realm of understanding BTRFS. Feel free to tell me to go RTFM!

tkaiser · 7. Dezember 2017

Unlike for example ZFS almost all btrfs code lives inside the kernel. So it's not a great idea to use btrfs running with something ancient (like a 3.16 for example which can happen with Debian Jessie / OMV 3). Similar problem with btrfs-progs package. The one you would use with OMV 3 (Debian Jessie) is horribly outdated while the one usable with OMV 4 (Debian Stretch) is recent enough.

https://btrfs.wiki.kernel.org/index.php/Status

In other words: do some research and wait for OMV 4 release IF you still want to rely on it for data.

Due to a couple of reasons I consider btrfs a good choice for the rootfs (since you can create snapshots and recover from updates that went wrong) and for some special multi disk setups (using the linear mode instead of RAID when there's no availability needed). For most other setups I would prefer ZFS instead.

1activegeek · 7. Dezember 2017

Thanks @tkaiser, this helps. OMV 4 has still been "beta" enough that I don't want to make the move yet. I've learned my lesson going bleeding edge on systems that may not be reliant for day to day operations, but provide happy family home services that can go down and I never heard the end of it!

While I do understand ZFS a bit more, my main reason of considering BTRFS was the ability to dynamically grow/shrink things. My only big issue seen with using ZFS for my uses, is if I end up adding in another disk for growth I can't just resize the pool. Or if I wanted to just take 2 disks out and create a separate storage pool for some other purpose, I can't just rob from one without impact and move to another. Perhaps things have changed - I was on the ZFS train probably 4-5 yrs ago when I was running FreeNAS.

tkaiser · 7. Dezember 2017

Zitat von 1activegeek

While I do understand ZFS a bit more, my main reason of considering BTRFS was the ability to dynamically grow/shrink things

While this is really a lot more easy compared to other solutions it should be noted that the necessary re-balancing can take quite some time especially with HDDs.

I did a test just recently so the stuff was still there in a terminal window. The following is a btrfs setup with metadata stored as RAID-1 and data linear (so simply spreading accross the disks without redundancy). Adding another disk is done in no time but the rebalance that is necessary almost all the time to make really use of the multi disk setup took ages given that we're talking about ~205 GB data and 3 120 GB SSDs (pretty fast ones)

Code

Label: 'BTRFS_LINEAR'  uuid: 90634268-601f-405e-95f0-567897956a1f
	Total devices 3 FS bytes used 200.03GiB
	devid    1 size 111.79GiB used 102.04GiB path /dev/sda
	devid    2 size 111.79GiB used 103.03GiB path /dev/sdb
	devid    3 size 111.79GiB used 0.00B path /dev/sdc


root@clearfogpro:~# time btrfs filesystem balance /srv/dev-disk-by-label-BTRFS_LINEAR
Done, had to relocate 205 out of 205 chunks


real	34m40.677s
user	0m0.001s
sys	11m58.495s


root@clearfogpro:~# btrfs filesystem show
...
Label: 'BTRFS_LINEAR'  uuid: 90634268-601f-405e-95f0-567897956a1f
	Total devices 3 FS bytes used 199.98GiB
	devid    1 size 111.79GiB used 69.00GiB path /dev/sda
	devid    2 size 111.79GiB used 67.03GiB path /dev/sdb
	devid    3 size 111.79GiB used 68.03GiB path /dev/sdc

Alles anzeigen

Disclaimer: today I checked dmesg and got a lot of SATA/AHCI errors on this host (seems a cabling/contact problem with one SATA controller there) so maybe this also negatively affected balance times before. I would test this with a realistic setup before making assumptions. But of course this applies to every storage attempt used. Without testing nothing is save to be used.

1activegeek · 7. Dezember 2017

Interesting. But is the filesystem unresponsive or unusable during that time of rebalancing? If it isn't available, that would be a huge issue I can imagine because a failure could happen at any time, happens in the midst of doing some large transfer or writing of new content - then this strikes - system is unable to process - I imagine that can't be the case.

While I don't like the idea of having to take an extended time to wait for a re-balance, I do value the ability to adjust accordingly. Thanks for sharing those numbers, that gives some real world info to compare. Barring any snafu on SATA errors

The funny part of it all is - I usually try to plan for the most flexibility, and half the time I never end up needing/using it anyway. So maybe I should just bite the bullet and jump back on the ZFS bandwagon or just keep the status quo that's been working (XFS formatted with MergerFS/SnapRAID). I don't have massive critical data, and most of it is replaceable, just the annoyance factor of re-populating and configuring things more than anything.

tkaiser · 7. Dezember 2017

Zitat von 1activegeek

But is the filesystem unresponsive or unusable during that time of rebalancing?

You do all of this on a live filesystem, the size growth happens also instantly (I added the device, had the share mounted over AFP on my MacBook and checked one second later how large the macOS VFS layer reports the AFP share --> new size reported just one second later). I did no performance testing yet (not even looked into tunables where you might adjust priorities and stuff) and I partially doubt that I would be able to create useful numbers since currently only testing with a bunch of SSDs (searching for host side limitations having in mind that in most real-world scenarios HDDs will be the bottleneck).

In theory due to btrfs design (CoW) nothing should happen except a performance drop even if you make heavy use of the filesystem in parallel while it's internally rebalancing. But this would either need a lot of testing or simply ignorance (that's the mode I'm looking into since the majority of OMV users here plays mdraid for no reason and I want to come up with some sort of tutorials how to use better storage topologies -- mini example)

BTW: Since we're also talking about ZFS I just learned recently that small implementation differences here and there can make huge differences in reality. And that you can't realistically test for all stuff. Simple example: ZoL (ZFS on Linux) still seems to not implement 'sequential resilvering' (as it's standard on Solaris since quite some time). That makes a huge difference in reality when there's RAID-Z running for some time. Once the first disk died and gets replaced the ZoL resilvering process both takes ages and can easily kill remaining disks of the same age since the way it's implemented in Linux ends up with an almost pure (or let's call it worst case) random IO pattern.

tkaiser · 9. Dezember 2017

Since there was another spare SSD flying around I repeated some tests.

First I added the new 120GB SD as a whole to my BTRFS linear setup (metadata as RAID-1 since I want an intact filesystem even in case one of the disks fails but data only in linear fashion so zero redundancy -- that's IMO the best mode for media archives people today mis-use RAID-5 for). Then I started a full rebalance again with a little more than 200 GB data on the filesystem.

Took 44 minutes but I've been warned

Code

root@clearfogpro:/# time btrfs filesystem balance /srv/dev-disk-by-label-BTRFS_LINEAR
WARNING:


	Full balance without filters requested. This operation is very
	intense and takes potentially very long. It is recommended to
	use the balance filters to narrow down the balanced data.
	Use 'btrfs balance start --full-balance' option to skip this
	warning. The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
Done, had to relocate 212 out of 212 chunks


real	44m26.745s
user	0m0.001s
sys	13m52.610s

Alles anzeigen

In my setup such a balance doesn't make much sense since it doesn't really matter on which disk which chunk of data is lying around. But now everything is perfectly balanced. That's what's ' btrfs fi show' is reporting:

Code

Label: 'BTRFS_LINEAR'  uuid: 802946e5-9d30-4536-815d-df1759a76a34
	Total devices 5 FS bytes used 203.95GiB
	devid    1 size 100.61GiB used 40.03GiB path /dev/sda2
	devid    2 size 100.61GiB used 40.00GiB path /dev/sdc2
	devid    3 size 100.61GiB used 39.00GiB path /dev/sdd2
	devid    4 size 104.51GiB used 42.03GiB path /dev/sde2
	devid    5 size 111.79GiB used 51.00GiB path /dev/sdb

Next test: Removing /dev/sdb again and measure how long this takes (less than 8 minutes):

Code

root@clearfogpro:~# btrfs device remove /dev/sdb /srv/dev-disk-by-label-BTRFS_LINEAR
root@clearfogpro:~# btrfs fi show
...
Label: 'BTRFS_LINEAR'  uuid: 802946e5-9d30-4536-815d-df1759a76a34
	Total devices 4 FS bytes used 203.95GiB
	devid    1 size 100.61GiB used 52.03GiB path /dev/sda2
	devid    2 size 100.61GiB used 52.00GiB path /dev/sdc2
	devid    3 size 100.61GiB used 53.00GiB path /dev/sdd2
	devid    4 size 104.51GiB used 56.03GiB path /dev/sde2

So it took 8 minutes to move 51GB out of the way. But I monitored what whas happening in parallel and there were a lot writes to sdb too so I would believe this rebalancing stuff could be even further optimized (please note that I'm on 4.13 here).

Then I partitioned sdb again just like the other disks and added /dev/sdb2 back to the filesystem:

Code

Label: 'BTRFS_LINEAR'  uuid: 802946e5-9d30-4536-815d-df1759a76a34
	Total devices 5 FS bytes used 203.95GiB
	devid    1 size 100.61GiB used 52.03GiB path /dev/sda2
	devid    2 size 100.61GiB used 52.00GiB path /dev/sdc2
	devid    3 size 100.61GiB used 53.00GiB path /dev/sdd2
	devid    4 size 104.51GiB used 56.03GiB path /dev/sde2
	devid    5 size 100.61GiB used 0.00B path /dev/sdb2

Done within a second and the filesystem growth immediately reflected everywhere (I checked on my MacBook the size of the mounted AFP share. Always in sync with the btrfs filesystem size at the server). This time I only want to rebalance my metadata (making use of redundancy) and this happens within seconds:

Code

root@clearfogpro:~# btrfs fi balance start -mconvert=raid1 /srv/dev-disk-by-label-BTRFS_LINEAR
Done, had to relocate 2 out of 213 chunks

BTRFS volume with multiple subvolumes - how to setup shared folders

Jetzt mitmachen!

Tags