Backup and 'silent data corruption'-avoidance strategy

  • Hello,


    I'd like to set up a home NAS to serve my dvd library and handle secure backup of data that is stored on the NAS.

    My budget is tight, but I have a raspberry pi 4 available - which means I'll have to settle for USB3 (which I've read tends to be a bit finicky on RPI4s).


    My plan is to have the following setup:


    Hardware:

    RPI4

    2x external USB3 HDD (one 8 TB and one 4 TB)

    Drive 1: 8 TB

    Drive 2: 4 TB


    Software:

    OMV for general storage handling.

    Jellyfin for serving my dvd library.


    What I'd like to do is:
    1. Use the 8 TB drive as a main drive with some routine to periodically check for silent data corruption.

    2. Once every week spin up the 4 TB drive and backup everything inside a folder on the 8 TB drive with a new timestamp and remove deleted files (this would have to be less than 4 TB ofc.)

    3. Periodically check the 4 TB drive for silent data corruption.

    4. Be able to unplug the 4 TB drive and read it directly on an entirely different system. (e.g. both drives formatted with ext4)


    Is this possible with OMW? I've read a bit about snapraid, but the guides I've found say you have to choose a data drive and multiple parity drives. This sounds like the parity drives would only contain parity bytes, requiring rebuilding to read the original data.


    I know it's less storage efficient to have a full copy of the data, but I'd like to keep it as simple as possible.


    Thank you.

    Best regards.

  • I've read a bit about snapraid, but the guides I've found say you have to choose a data drive and multiple parity drives. This sounds like the parity drives would only contain parity bytes, requiring rebuilding to read the original data.

    You have this backwards. One parity drive per several data drives. And yes, lost data can be recovered from the parity, but this is a manual operation.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • You have this backwards. One parity drive per several data drives. And yes, lost data can be recovered from the parity, but this is a manual operation.

    In this case you wouldn't be able to do a full recover in the case of a full drive failure, as the parity data contained in a single parity drive cannot possibly contain all the information contained in multiple data drives. Assuming all drives are the same size.


    Is snapraid only useful for partial drive failures?

  • In this case you wouldn't be able to do a full recover in the case of a full drive failure, as the parity data contained in a single parity drive cannot possibly contain all the information contained in multiple data drives. Assuming all drives are the same size.


    Is snapraid only useful for partial drive failures?

    You are very mistaken. One parity drive will protect you from one data drive failure. But the protection isn't locked to any specific data drive.


    You may want to visit the Snapraid home page and read up on it.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

    • Offizieller Beitrag

    With one parity drive you can recover from one full drive failure.

    With two parity drives you can recover from two full drive failures.


    ...and so on.


    A parity drive only stores the parity for a set of drives, not the full information for all drives in the set.


    https://en.wikipedia.org/wiki/Parity_bit

  • Thanks, I see that I was wrong.


    I found this thread link which explains it to some extent.

    I does however not explain how you can still recover data from a lost drive after you've changed some sectors on the operational data drives since the last sync, as you're using the sectors on the data drives together with the parity data to recreate lost data. I'll however let that slide, and just believe that it works.


    As I only have two drives, is there a simpler solution than snapraid? What I'd like to do is simply to have a simple backup (rsync?), together with data integrity checks to secure against silent data corruption.

    • Offizieller Beitrag

    Using snapraid you will only be able to restore files you have the parity off. So files added or modifications since a recent sync will be lost. Unless you have a backup copy...


    As far as I know there is, currently, no tool as you describe it. There are backup tools with checksums that can be used to detect errors in backup sets, but not to fix them.


    I am writing, as a "hobby project", a very simple command line backup tool, "bitback", for local snapshot style backups with file level deduplication and file hashes. It can also scan for, detect and automatically fix bit rot in both directions, to/from the most recent backup copy. No idea when it will be released. Perhaps this year...


    The idea is to run bitback daily to backup and hash new or modified files in a timestamped full snapshot of the source filesystem. At the same time a certain percentage of the old files in both the source and the backup filesystems are "scrubbed" and checked for bitrot, and corrected if possible. For performance reasons (read speed during re-hash) the filesystems should be on different drives on the same computer, unless you have 10GbE or don't mind waiting and congestion on the network.


  • This sounds great! It sounds exactly like what I'm looking for.

    Can I find you on e.g. github, so I can find it when/if it's released?


    I think I'll go for something like scorch as a temporary solution.


    Thank you for your help Adoby and gderf.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!