Btrfs + UnionFS + SnapRAID

  • I'm playing around with OMV for the first time in years. Needless to say, this project didn't have all these options back when it began... I was curious if anyone is running (or has played with) this type of setup?


    Basically, I have 4-5 HDDs, of varying size, formatted Btrfs. They're all pooled together with UnionFS - with the exception of one of the largest HDDs. Then, I have them set up with SnapRAID; one HDD is for parity and the others are data/content HDDs. The OMV server is holding MP4s.


    I'm not looking for snapshots or anything fancy. I just figured Btrfs can handle bit rot whenever a file is read. However, not all my movies are read all the time. Years can go by without watching a movie in my collection so I figured SnapRAID could then be scheduled to run weekly or monthly, catching bit rot. I've lost a few movies over the years and had to dig through my box of DVDs in the basement and re-rip them...


    Am I over-complicating the mousetrap or is this not a horrible idea?

    • Offizieller Beitrag

    I just figured Btrfs can handle bit rot whenever a file is read. However, not all my movies are read all the time. Years can go by without watching a movie in my collection so I figured SnapRAID could then be scheduled to run weekly or monthly, catching bit rot.

    There is a difference between BTRFS and SNAPRAID, when it comes to Bitrot. A BTRFS scrub can detect bitrot but unless there are two identical checksumed files,this is done in the BTRFS equivalent of RAID1, it can't do anything about it. Detected errors are uncorrectable.


    On the other hand, SNAPRAID can detect and correct bitrot error on any protected drive. It works, tested.

  • So, given my current needs, what's the recommended way to proceed? I'm still in the testing/playing phase so I can nuke the array and start over without issue. Forgot to mention that I like having the ability to see the files on the other disks even if I loose one from the pool.

    • Offizieller Beitrag

    So, given my current needs,

    Aggregating disks under a common mount point is one matter - that's a separate topic. As it is with most things, there are usually two or more ways to do the same thing.
    _____________________________________________


    Assuming your need is "bitrot detection and correction". in preservation of older files, there are two options that I'm aware of, that are supported by OMV:
    - A RAID1 or 10 implementation of a CoW filesystem like BTRFS or ZFS: ((It is possible, BTW, to achieve 2 checksumed files in ZFS, with a simple volume using the copies=2 option. I don't know if this is possible with BTRFS.)
    In both cases, BTRFS and ZFS, two identical files are required to correct bit rot. This is done by overwriting the corrupt file, with the 2nd copy that still matches it's checksum. This works well, but the downside to this approach is the loss of 50% of disk real estate to house the 2nd copy of each file.
    - SNAPRAID:
    All that's required for bitrot protection, with SNAPRAID, is that the file is on a protected disk. SNAPRAID uses file checksums, the content file, and parity information to correct bitrot. The disk real estate requirement for SNAPRAID depends on the number of disks being protected but it could be as little as 20%. There's also the added benefit of being able to recreate a failed disk, restoring an older version of a file or folder (as it existed during the last SYNC.)
    If using SNAPRAID, the best choices for a data disk file system are EXT4 and XFS. While others work, they have caveats. (SNAPRAID FAQ)


    I'm personally using ZFS in zmirrors (RAID1 equivalent) and SNAPRAID on separate servers. Both work well for detecting and correcting Bitrot.

  • There is a difference between BTRFS and SNAPRAID, when it comes to Bitrot. A BTRFS scrub can detect bitrot but unless there are two identical checksumed files,this is done in the BTRFS equivalent of RAID1, it can't do anything about it. Detected errors are uncorrectable.
    On the other hand, SNAPRAID can detect and correct bitrot error on any protected drive. It works, tested.

    It ist the Same between btrfs and Snapraid If they are configured the same.


    Both Work.


    Bitrod protection should work in raid5 as well.


    Go for BTRFS If you want something officially supported by the Kernel and in OMV without Plugin


    Go for ZFS If you want something with longer experience in OMV (Plugin). Inform yourself about the licence issues.



    Greetings,
    Hendrik

    • Offizieller Beitrag

    Bitrod protection should work in raid5 as well.

    There's a difference between error detection and correction.
    As a test, try using sector editor from a live distro (something with LDE) and edit a single file in a BTRFS simple volume or the written area of a BTRFS RAID5 array. (A minor change.) A subsequent scrub will "detect" the error, but it won't correct it. Without the second file with a clean, matching checksum, it's an uncorrectable error.

  • Thats not correct.
    For error detection hash is sufficient, no RAID needed.
    For correction RAID 5 will do.


    If your claim was true, RAID 5 was pointless.



    Zitat von Wikipedia

    RAID 5RAID 5 consists of block-level striping with distributed parity. Unlike RAID 4, parity information is distributed among the drives, requiring all drives but one to be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost.

    No Diskeditor Experiments needed. Just read.

    • Offizieller Beitrag

    The Wikipedia article is about "whole drive" replacement using parity - this is what RAID5 does. It is not about single file silent bitrot detection and correction.


    A single RAID5 array, in either of the ZFS or BTRFS flavors, does not protect individual files from bitrot, or restore them to their original state, in a scrub. They can't. That would be like expecting that a MD5 hash from a known good file, could somehow be used to correct that file, when a second incorrect MD5 hash shows the file is corrupt. With a single file checksum that no longer matches, there's no way to reconstruct the data. Without the second file with a good checksum, to overwrite the file with the bad checksum, there is no repair. (A second file is available only in RAID1, RAID10, and other CoW array types where there are at least 2 copies of each file. There is the notable exception in a ZFS simple volume where copies=2 can be used to correct bitrot.)


    I've done the sector editor test years ago to prove (or not) that an inserted error can be corrected. While I did it with ZFS, it's easy enough to test a BTRFS simple volume and run a scrub. With a mirror (RAID1 equivalent, were 2 separate copies of the same file exist) errors are correctable. With a simple volume, the error will be detected, but it will be uncorrectable.


    The reason SNAPRAID can repair bitrot without a second file is; SNAPRAID stores file checksums and, with the information present in the content file (a complete file inventory and metadata of all files on all disks), it can restore the file as it was (down to the bit) using parity information for that individual file, as of the file's state during the last SYNC. ZFS and BTRFS can't do that because their RAID5 parity information is stored on the fly - there's no time reference.

  • If bitrot Happens in one Drive btrfs can restore from the others plus the parity information. Just like Snapraid. Why Not?
    What would bei the Point of raid5 otherwise?


    Here you can inform yourself:
    https://blog.synology.com/how-…-against-data-corruption/


    In other words, if the recalculated checksum does not accord with the data checksum, a cross-check with its metadata checksum will be followed to see whether it is the file or data checksum that goes wrong. Once data corruption is detected, the system will try to repair the corrupt data by retrieving the redundant copy (RAID 5).

    • Offizieller Beitrag

    Because I take very few claims at face value, I've already informed myself with a sector editor test. Back with I looked closely at bitrot, I researched what was needed to detect it and correct it. Even at that, I didn't trust the claims until I tested it. Since a zmirror (ZFS RAID1) did the job, I stopped there. (But I do find it interesting, in the referenced article, that Synology doesn't support scrubbing in RAID1 and 10, and there's no explanation for why.)


    In any case, I didn't test the COW equivalents of RAID5, 6, etc., due to a lack of disks, so I can't make any statement about those types of implementations. I do know, for a fact, that simple volumes in ZFS can't restore corrupted files (without copies=2). I also know that SNAPRAID can restore bitrot corrupted files, as of the last sync, and use a bunch of different sized disks which is line with what this thread is about.
    __________________________________________________________


    @rkillcrazy
    I think you have enough info to decide on whether to use SNAPRAID or not. SNAPRAID detects and corrects bitrot (tested), and it allows for the restoration of a disk much like traditional RAID. Where it shines for the home user is it also allows for the restoration of individual files, as of their state during the last SYNC command, and for the use of dissimilar sized disks. An additional disk can be added in the GUI, at any time, with very little effort. If you're going to use SNAPRAID, look over the filesystems in the SNAPRAID FAQ. It would be best to use a file system that doesn't involve caveats.


    As far as common mount points, disk aggregation, there's nothing wrong with the UnionFS plugin if the correct storage policy is used.

  • @rkillcrazy
    I think you have enough info to decide on whether to use SNAPRAID or not.

    I hope I understood all this correctly. As far as I can tell, I'm not over-complicating the mousetrap. Please correct me if I'm wrong....


    Side note: I certainly didn't mean to start a debate/rift in the thread but I certainly respect all opinions and all parties who participated. In my mind, different strokes for different folks....

    • Offizieller Beitrag

    I hope I understood all this correctly. As far as I can tell, I'm not over-complicating the mousetrap. Please correct me if I'm wrong....

    Well, first and foremost, I was under the impression that you want to use what you have - that you wouldn't be buying disks to set up some implementation of RAID5. If that's the case, I believe you're good to go.


    This is what I'm basing that opinion on:
    1. Different sized disks are involved.
    2. You're storing media, I.E. large files, where it seems apparent that extra space is desirable.
    3. Since you've been bitten by it before, you want bitrot protection.
    And finally
    4. Does your setup make sense?


    I think it does. For your scenario, SNAPRAID would be my choice. Since SNAPRAID can take care of checksums and scrubs, the only thing I'd do differently would be the use of EXT4 or XFS on data drives and the parity disk.
    __________________________________________________________________________________________


    Side note: I certainly didn't mean to start a debate/rift in the thread but I certainly respect all opinions and all parties who participated. In my mind, different strokes for different folks....

    Nothing to worry about here and it wouldn't be your fault in any case. This is your thread/question where, sometimes, a thread may leave the topic at hand. It happens - we're human. I will say, for my own information and curiosity, bitrot correction in CoW RAID5 implementations, needs to be tested.


    With SNAPRAID, just note that scrubs, fixing silent corruption, and other maintenance processes are not automatic. You'll need to schedule commands such as snapraid sync, snapraid scrub and snapraid -e fix . These commands can be scheduled in the GUI using Scheduled Jobs, where the output can be E-mailed to you.


    I'd give the SNAPRAID manual at least a good once over read through to understand what you may need.

  • With SNAPRAID, just note that scrubs, fixing silent corruption, and other maintenance processes are not automatic. You'll need to schedule commands such as snapraid sync, snapraid scrub and snapraid -e fix . These commands can be scheduled in the GUI using Scheduled Jobs, where the output can be E-mailed to you.

    I cannot see my data changing more than once a week - at most. So, what, a nightly scrub and a weekly sync? Run the fix if/when scrub finds an issue?

    • Offizieller Beitrag

    I think a weekly sync is fine. And using the standard "snapraid scrub" command nightly is fine as well. I run snapraid -e fix before a new sync. (There's no penalty for running it, if nothing is wrong.) If files were deleted, during the period between the last sync and the -e fix command, a few errors will be generated. I expect this. (It's in the dialog, from which I get an E-mail.)


    Others may have a different routine, for different reasons, but there's really no "wrong" here, other than it needs to be done on a fairly regular basis to gain the benefit. For this reason, automating most of these processes, with E-mailed reports, made sense to me.


    Following are a few commands I used on a backup server. Since this server is now running cold (I turn it on periodically, backup, and shut it down) the schedule is disabled.



    (If the normal sync command fails for some reason, deleted files - other, the last command clears the log jam.)

  • If hardware baseline is error free (memory RAM, controllers, CPU, North bridges, hard drive pcb, power unit etc) then


    Raid 5 in a not degraded state + checksum eg from Btrfs + the data can detect and correct one block data error for a file if implemented, and the code is like:


    cs: checksum for the file


    if stored cs != new calculated cs {
    # error in file or stored cs but raid 5 parity is correct because only one error is the limit
    calculate all data block for the data file one by one (input: parity and all data block except current) until the correct cs is calculated. Data is then corrected
    }
    if stored cs = new calculated cs but raid 5 parity not matching the file then calculate and correct the raid 5 parity.


    if new calculated data file parity = stored parity then calculate and correct cs.


    So with 3 data point
    1) the data file
    2) the file check sum
    3) parity block


    is it possible to detect and correct one block error:


    They (1: Btrfs data scrubbing and 2: RAID scrubbing) work together to mitigate the risk of silent data corruption and help you maintain a healthy storage system.


    Snapraid or data protection functionalities in eg. Freearc 0.666 can handle more then one defect data block in the datasets.
    But that algorithm use a lot of processor resource and it is why snapraid wrote:
    "SnapRAID is mainly targeted for a home media center, with a lot of big files that rarely change."

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!