BTRFS from Rockstor

  • I Like Rockstor but it's reliance on BTRFS quotas is starting to become a real problem (Essentially BTRFS quotas absolutely annihilate performance, drop a load of snapshots with them on and the system is unsable for hours,turning them off causes Rockstor to break in the current versions)


    Whilst I could easily(ish) go to a straight Centos/Debian install where's the fun in that (also I quite like the OMV gui)


    So what I'm about to attempt is to mount the BTRFS Raid6 created by Rockstor on OMV.


    Firstly though I'm building a 4.12 kernel for OMV before I let it anywhere near my disks (Lots of Raid5/6 fixes in the newer 4.xx kernels) already updated btrfs-tools.
    Then we can see how much *fun* getting the array to mount is going to be. ;)

  • Well once It booted in the 4.12, I added the disks the Pool showed up multiple times but mounting one of them from the gui sorta worked (it did mount it but it breaks the filesystems view)


    Code
    Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C; btrfs-show-super '/dev/sdd' 2>&1' with exit code '127': sh: 1: btrfs-show-super: not found


    It looks like the newer btrfs-progs I upgraded to deprecated btrfs-show-super
    https://btrfs.wiki.kernel.org/…ge/btrfs-inspect-internal


    Edit:


    Think I managed to work around that



    Created a bash script @ /sbin/btrfs-show-super


    Bash
    #!/bin/bash
    btrfs inspect-internal dump-super $1
    • Offizieller Beitrag

    Would you dare to delete one of the multiple entries for Pool1? :rolleyes:
    (I wouldn't!)


    Have you thought about getting a 4TB external drive, copying your data, and building OMV from scratch? (No upgraded kernels, packages, etc.) With the 4TB drive you could recreate your array, without the bandaids, and have the 4TB for backup as well.


    Just a thought.

  • Case still has free slots although I'd probably have to move a couple drives around (The HBA card is an older one only takes upto 2TB, the onboard sata's have no such restriction)


    Problem is I like the fact with BTRFS you can add/remove disks from a RAID array hit re-balance and off you go, the only things i've done to OMV is update the Kernel for the 4.12x BTRFS raid5/6 fixes and update btrfs-progs


    I've ripped out the rockstor snapshots and got snapper working as well ;)

    • Offizieller Beitrag

    Well, you're the first that I know of who's running BTRFS in a RAID6 confg.


    While I'm running BTRFS on a single disk, I won't adopt RAID1 (for auto healing bitrot and other errors) until it's out of beta or listed as "OK". -> BTRFS Status I sincerely hope it's soon. (Also, I'd really appreciate a painless path from a RAID1 pair to a single disk.) Until then, and even afterward, I'll maintain full copies of my main server's data on a backup server and other PC's.


    Unless I missed something, they have RAID 5 and 6 implementations as unstable. (I'm guessing you're aware of that.)
    ______________


    In any case, you should have a complete data backup. There are numerous horror stories on this forum about dead arrays and, I suspect, a BTRFS array wouldn't be an exception. A 4TB drive is not expensive, for the peace of mind.

  • Raid 1 is mostly stable , Raid 5 had a problem where the parity data could end up being wrong if you were unlucky and there was a possibility of killing something on restore.


    This arrays been a raid5, a raid 1 once the parity issue became known and now it's a raid6 as now that the fixes are in to prevent the issue of it corrupting files on a restore I'm comfortable enough that even with the lack of checksum on parity that the gains in flexibility are worth the slight risk that a file in flight when something dies will be corrupted.


    Even before the Parity issue was known I've recovered the raid5 from overzealous clicking of next during a proxmox install and wiping one of the drives on accident.
    Whilst it was a 4 disk raid1 (yes you can do that on BRTFS it just make sure there is 2 copies of the data on different drives) it also survived the Board dieing and a transplant to another host.


    That said the data I care about gets backed up in multiple locations the rest is stuff that I care enough to try and protect against a drive failure but if I lost it then oh well opps.

    • Offizieller Beitrag

    1. Raid 1 is mostly stable


    2. Whilst it was a 4 disk raid1 (yes you can do that on BRTFS it just make sure there is 2 copies of the data on different drives) it also survived the Board dieing and a transplant to another host.


    3. That said the data I care about gets backed up in multiple locations the rest is stuff that I care enough to try and protect against a drive failure but if I lost it then oh well opps.

    1. Well, "mostly stable" is not stable. While I really want the bitrot protection of BTRFS RAID1; if a drive failed, I wouldn't want hassles when running a broken mirror. The way I see it (wanting to have my cake and eat it to), a broken mirror is just a single disk. It should be that simple and, if it's not, a path to down grade to single disk operation is not too much to ask. (In a perfect world. :) )


    2. A four disk RAID1. That's an intriguing idea. While I have a couple 4TB drives, I'm not a fan of how they're designing them to "pack in more bits". (Which I believe will lead to ever increasing bitrot issues as drives get larger.) 2TB drives are about the limit for the longevity of "polarized bits" on a magnetic platter, in current standard physical sizes. It's going to be interesting to see how long these large capacity drives last. BTW: I noticed a "Whilst" in there. Are you in the UK?


    3. I'm with you on that. Currently, I have 4 full copies and a few months from now, one of them will be off site. (Which is another reason why I want bitrot protection on the top level copy.)

  • As far as I understood it goes read only which is a pain but probably a fairly safe way to handle things.


    I'm not really sure if the BTRFS "Raid" levels are that close the raid equivalents, it just seems to define the number of copies/or how the parity is created but BTRFS doesn't seem to be quite as constrained around the size/type of drives see - http://carfax.org.uk/btrfs-usage/. The "Home" server (Used to be in Colocker took it home when the board blew, and yes I'm in the UK ;) )


    It' actually the machine using BTRFS for Raid whilst I do have some other machines that use BTRFS file systems they're a VM and the raid is handled by the raid controllers in the servers (Note: I'm not a fan of hardware raid, because of proprietary on disk formats, but they're rented/dedi's so I'm not about to replace their raid controller).


    Hopefully I might be in a position to play with Gluster/Ceph or some of the other distributed file systems soon, I was also looking at pulling the disks out of Raid and using Minio's EC backend, but it's if there is then a stable utility to mount that as a more tradition filesystem for apps that don't understand object storage to interact with.

  • I'm not really sure if the BTRFS "Raid" levels are that close the raid equivalents

    They aren't. BTRFS just as RAIDZ use redundancy in a reasonable way unlike anachronistic RAID (especially all those attempts at the block device layer where you always end up syncing total capacity instead of used. Empty 40TB mdraid needs a rebuild --> 24 hours full disk stress for nothing).

    • Offizieller Beitrag

    What I'd really like on the home front would be the functional equivalent of BTRFS RAID1, with two drives in the mirror, in different machines. With check sums assigned to a file, along with some sort of established trust between two PC's, why couldn't the second copy of the file be on another PC?


    While it wouldn't be ideal at the enterprise level, something like that would be a great low level distributed files system for home and small business use. If the primary server died a backup server would be up-to-date, as of the last write operation, and bitrot free. It would also be nice if this sort of "pairing" could be done at the folder level as well, instead of block devices (or, as I've learned of late, at the partition level).


    On the other hand, if I was a dev, I'm sure I'd see the folly in all of this. :)

  • What I'd really like on the home front would be the functional equivalent of BTRFS RAID1, with two drives in the mirror, in different machines. With check sums assigned to a file, along with some sort of established trust between two PC's, why couldn't the second copy of the file be on another PC?

    For all the other readers not getting the real difference between a great BTRFS RAID-1 and the primitive/stupid mdraid RAID-1 variant:

    • BTRFS just like ZFS stores checksums of all written data (on HDDs even two copies at different locations to get some sort of additional redundancy), so once bit rot happened, it is in most situations able to detect it (when reading or when running regular scrubs). If the reason for bit rot is a defective disk or a cable/connection problem between controller and disk then a btrfs RAID-1 provides self healing functionality since then the checksum mismatch will not only lead to bit rot detection but also correction (the defective data on one disk being replaced with valid data from the other). If the problem is caused by a faulty controller or host where both disks are connected to then you're lost as usual since you then have either two bad copies (controller faulty) or two 'good' copies containing wrong data (host faulty)
    • mdraid's RAID-1 provides nothing like that. It's only about availability (and only covers the rare case when a single disk fails hard which disks usually not do, they die slowly and corrupt a lot of data in their last moments :) ) and provides zero data safety as well as zero data integrity functionality. It's just a waste of disks :)

    As written above: If you allow your data to be already corrupted at the host level (choosing inappropriate hardware for your storage eg. crappy Raspberries) checksums won't help you and also BTRFS RAID-1 won't provide any self-healing capabilities. So you either upgrade your hardware to be more reliable (at least ECC DRAM would be nice) or you drop the 'self healing' RAID-1 idea. With the latter it then gets easy since you can just use a single disk with btrfs on it, create periodic snapshots and send them to another machine (btrfs send|receive). This removes some real-time protection (since data created/modified on the 1st host between two snapshots won't be on the 2nd host already) but adds already some backup functionality due to versioning.

    • Offizieller Beitrag

    Raid on different machines would be CEPH for example.

    Actually, I looked into Ceph and GlusterFS. Both are, without doubt, the software end of enterprise solutions. But when they refer to hard drives or export locations as "bricks", I believe their intended purposes far exceed the storage capacity I need and the thickness of my wallet. Of the two, it appears that GlusterFS could be made to scale down to something that "might" be usable at home. On the other hand, both come with significant learning curves and additional complexity that are unwanted (daily operation, drive and data recovery procedures, etc.).


    This is why BTRFS seemed attractive. It's a drop in replacement for EXT4 that can simply be "used" without the need for storage planning or configuration. On the other side of the coin, BTRFS comes with a lot of convenience and high end features, if one chooses to use them. (Scrubbing, pooling, etc.)


    It would be great to see BTRFS come out of beta, in the RAID1 implementation (for bitrot scrubbing), but the news is not encouraging. It appears that fixes / patches have been made but they're stuck on the other side of a l-o-n-g development cycle. It might be a few years before they make their way out to the public.
    ______________________________________________________



    1. mdraid's RAID-1 provides nothing like that. It's only about availability (and only covers the rare case when a single disk fails hard which disks usually not do, they die slowly and corrupt a lot of data in their last moments :) ) and provides zero data safety as well as zero data integrity functionality. It's just a waste of disks :)


    2. As written above: If you allow your data to be already corrupted at the host level (choosing inappropriate hardware for your storage eg. crappy Raspberries) checksums won't help you and also BTRFS RAID-1 won't provide any self-healing capabilities. So you either upgrade your hardware to be more reliable (at least ECC DRAM would be nice) or you drop the 'self healing' RAID-1 idea.

    1. I completely agree. Beyond RAID's understood and documented deficiencies (in both hardware adapters and software implementation), there are other risks to consider as well. One of the risk factors that should be self evident is, if a component in a PC fails - a power supply is a worst case example - all components in the PC are at risk, including a RAID array. I'd much rather spread my disks into different devices and replicate data, creating true backup, rather than put all my eggs in one (RAID) basket.


    2. After really looking at the limitations of various hardware platforms, I broke down and bought a purpose built server (with ECC memory) awhile back. It's at the top level, where I'd really like to have BTRFS bitrot protection. From there, data shares are being replicated down to other devices. The R-PI is but one of three devices, under the top level.


    After testing the R-PI for years, I'm at least vaguely aware of some of it's limitations. Used with Rsync, for replicating changes to shares where users are not involved, it seems to be OK in that role. But I have no illusions in believing that it's data stores are perfect down to the bit. Restoring data from the R-PI, or trying to use it as a temporary file server, would be an absolute last resort.

  • 1. If a disk within mdadm RAID1 is dying and data is getting bad,you just pull out disk(physically or via -f ) and then you've got ok copy.Tried and tested :D
    2.If you're afraid about the data,but don't have enough money for dist-filesystem then you just have to be good at making backups. That's my moto.
    Make daily,weekly,monthly backups so that when something happens you can revert to it.

  • 1. If a disk within mdadm RAID1 is dying and data is getting bad,you just pull out disk(physically or via -f ) and then you've got ok copy.Tried and tested
    2.If you're afraid about the data,but don't have enough money for dist-filesystem then you just have to be good at making backups.

    Unfortunately you totally missed the point: it was about avoiding 'bit rot' and taking care of data integrity.


    RAID-1 as done by mdraid is a concept from last century when we weren't able to do better. But as everything that's anachronistic, outdated and simply bad these days people love it and fear the way better options we have today :)

    • Offizieller Beitrag

    1. If a disk within mdadm RAID1 is dying and data is getting bad,you just pull out disk(physically or via -f ) and then you've got ok copy.Tried and tested :D
    2.If you're afraid about the data,but don't have enough money for dist-filesystem then you just have to be good at making backups. That's my moto.
    Make daily,weekly,monthly backups so that when something happens you can revert to it.

    I'm backed-up very well. Since the bulk of my data store doesn't change much, I have 1 backup in "near" real time, 1 at the weekly level, and 1 more in an (ad hoc) two to three month interval. (These are all independent devices and the last two are shut off after changes are replicated)


    But TK is right; backup doesn't do anything about silent bitrot at the top level of a replication chain. All backup gets you in that scenario is a backup of a corrupted file. I have data that does W-A-Y back and the longer data is stored, unprotected, the greater the chance of silent corruption. Time is an enemy of long term data storage.


    In some cases, it might not matter. (If a single pixel in a picture changed color, it might not even be noticed.) However, cumulative errors and the as yet unknown integrity of today's enormous hard drives (as they age), truly begs for a checksumming file system where integrity issues can at least be identified.


    Along those lines, if BTRFS RAID1 ever comes out of beta, I'll adopt it.
    (And I'm hoping the project might consider a data integrity feature that would check identical files / folders on different machines.)

  • Along those lines, if BTRFS RAID1 ever comes out of beta, I'll adopt it.


    Why do you consider btrfs RAID1 'beta'? Please see https://btrfs.wiki.kernel.org/…le_Devices#Current_status (which might already be horribly outdated again).


    My only problem with btrfs is that almost all code lives inside the kernel so running outdated kernels is a no go with btrfs. In some situations the same applies to version of btrfs-progs (in other words: don't use btrfs at all with a default OMV 2 or 3 installation ;)


    And if you fear the downsides of immature btrfs kernel/userland code due to relying on a Debian 'oldstable' system choosing a zmirror instead is always an option.

    • Offizieller Beitrag

    1. Why do you consider btrfs RAID1 'beta'? Please see https://btrfs.wiki.kernel.org/…le_Devices#Current_status (which might already be horribly outdated again).


    2. My only problem with btrfs is that almost all code lives inside the kernel so running outdated kernels is a no go with btrfs. In some situations the same applies to version of btrfs-progs (in other words: don't use btrfs at all with a default OMV 2 or 3 installation ;)


    3. And if you fear the downsides of immature btrfs kernel/userland code due to relying on a Debian 'oldstable' system choosing a zmirror instead is always an option.

    1. According the to project page BRFS Status , under "Reliability", RAID1 is "Mostly OK".
    (This page refers to kernel 4.13 and it appears I have 4.9)
    When they talk about the loss of a one disk in a mirror resulting in an irreversible read only mode, that's a feature that could be accurately described as "beta".


    2. "Almost all (BTRFS) code lives inside the kernel".
    That explains a few things I've been puzzling over... According to the errata on BTRFS RAID1, the fixes are done and in but there's not even a suggestion of when they'll be available. Based on what you said about kernel integration, the reason for the delay seems apparent. I imagine there's a code vetting process and a lengthy procedure before those fixes are integrated into a new kernel, and even longer before a new kernel finally makes it out to userland. Crappers...


    I'm trying not to let you scare me here (into demoting the OMV server or changing the file system). All I'm doing with BTRFS is using it as a single disk file system. In the way of tools, I'm using BTRFS "scrub", start, cancel, and status - nothing else. With that noted, you don't think BTRFS is safe?


    3. I really didn't want to look at ZFS, because it's not a "drop in" file system replacement and it gets into yet another set of complexities. But,, since ZFS seems to be a standalone add-on, I may have to give it a closer look.

  • Unfortunately you totally missed the point: it was about avoiding 'bit rot' and taking care of data integrity.
    RAID-1 as done by mdraid is a concept from last century when we weren't able to do better. But as everything that's anachronistic, outdated and simply bad these days people love it and fear the way better options we have today :)

    Nah,i didnt miss the point just provided other side of the coin. Bit rot can happen and there is possibilty you won't recover from it.
    You have scrub with mdadm,zfs,btrfs so you do that weekly and if something arises,get those backups up.



    For me BTRFS is not still stable for us(POS systems) even though Oracle Linux offers it even with Raid5,6 support(as of 7.4).
    I hope bcachefs get some heavy updates and testing so that we can use that in the future.
    For now,for production system,either mdadm/ext4 or ZFS if you have ecc and very good planned system(with need for caching read/write).

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!