RAID is not for backup... But why?

  • Ok so I do realize RAID is not for backup and everytime I suggest it people go crazy :P .


    But what is the alternative? Let's say I have 10 HDDs. The only options I have found in OMV so far is RSYNC. This is definitely a bakcup, but then capacity-wise it's really a bad idea. So basically if I have 10HDDs I end up with 5 usable ones.


    My initial thought was to use RAID5 which makes me lose 1 HDD capacity.


    Please enlighten me if there are better backup options which would not be that aggressive on usage of capacity.


    Thanks!

  • Thank you for the detailed response.


    To be clear, I have very few personal content which will never reach 3TB. I am willing to sacrifice a disk for my important content and another disk for its backup.


    But for the remaining of the content, it is purely Plex library which I intent to build. So it will become hopefully huge and I would not want to lose it... But worst case scenario, all the content will be available online forever (unlike for example some personal files which can never be regenerated).


    In other words, I do care about backups, but definitely not as seriously as you clearly do. having 14TB of disk space for 1.2TB of data is outrageous for me and definitely not the way to go (again, if it was personal data I agree, but in my case it's just movies and music)... So I am looking for a way to decrease my chances of data loss without being too extreme.


    This is why I thought of RAID5 because given my limited understanding of it, I was under the impression that if one disk fail, you can rebuild the array using the parity on the other disks. So for me I thought "Ok good enough, it means I can tolerate 1 hard disk failure at a time" which is really a great improvement to not tolerating any failure at all. Now is a double-disk failure possible? yes... Will it screw my entire setup and I go back to 0? yes... But I really do think it is low probability.


    Again, for the personal data I will dedicate a 3TB disk and another one mirroring it (aka true backup).


    I do understand raid are not for backup and everyone seems to agree about this. I am really not trying to be stubborn, just explaining my situation to see if you guys agree or can give me some better advices.


    Thank you!

  • As you say, RAID5 will allow you to tolerate 1 total disk failure. But is that the only thing that could go wrong ?


    The most common cause of data loss, in my experience, is user error: accidental deletion or corruption. There is also bit rot, malware, multiple disk failures (which are very common when rebuilding an array after your single disk failure), other hardware failure, and all the "mechanical" issues (flood, fire, theft...).


    RAID does not protect your data at all against any of these. So basically, you are wasting 25% of your drive capacity to protect against a single type of failure, while ignoring the other types, when you could have full protection if you wasted 50%.


    It's like paying a cheap insurance that only protects your car if it gets hit from one side. You might as well not pay an insurance at all.

  • I do agree with you about it being a "cheap insurance" so might as well not get insurance at all. But at this point I do want to have an extra layer of protection against data loss, even if it's not perfect.
    However, the fact that ALL my array could go corrupt if 2 disks fail does scare me and I'm thinking staying away from RAID might be safer than risking it....

  • Check Out snapraid

    Looks interesting! The main advantage being worst case scenario I lose only one disk rather than the entire array.


    It is very poorly documented though. I could not find a nice post or youtube video explaining snapraid properly (other than their official website).


    I'd appreciate your feedback if you are currently using it.

  • I use snapRAID here with one parity drive covering (currently) four data drives.


    One parity drive will protect you from one drive failure. It will not protect you from two simultaneous drive failures; for that you need two parity drives, and so on.


    In your example case of 10 data drives, it would be unwise to run with only one parity drive and the author recommends 2 parity drives in this use case. Having more than the recommended two won't hurt.


    I find the snapRAID documentation more than adequate. The manual is more than 22 web pages, and the FAQ has around 60 entries. What and where are you reading that leads you to conclude it's poorly documented?

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • After writing my previous comment I found some links explaining SnapRAID and I got to understand it better. My opinion was after a quick youtube search (where there are tons of videos explaining RAID, but not snapRAID). I do take it back: it is well documented, but not as good as RAID :P


    I do think Snap RAID is the way to go for me, with 2 parity disks. This should be more than what I need.


    Could you please explain something to me? They say that the parity disks must be as big as your biggest disk. So is it ok to use EQUALLY big disk for parity? If I understand correctly the only condition for that would be to not have the disks 100% full? How much should be my cutoff if I'm using 3TB disks?
    Thanks!

  • Thank you for all the replies. I do indeed feel like I understand the concept much better now.


    I fully agree with you about a backup by definition is at least equal amount of data, best saved in another building, different physical conditions, etc...


    However, given that I'm mostly dealing with Movies/Songs, I think this would be an overkill. I think my best option would be to go with snapRAID and use 2 disks as parity. I mean heck, I can cover for 2 drive failures. If I get unlucky and get more failures then that's it, too bad... and let the downloading start all over again :P Also if I understand correctly, worst case scenario is that I'll lose the failed HDD, not the entire setup... So I do think it's a safe option for me honestly.


    Regarding building the huge library, why exactly is it recommended not to have the metadata on the boot drive? This would now mean that I have to get a 500GB SSD for the plex metadata AND like 32GB SSD for OMV install (not the end of the world, just thinking outloud). I trust 32GB is more than enough for OMV, right?


    Again, thank you for everyone's replies. They all really did help!

  • Could you please explain something to me? They say that the parity disks must be as big as your biggest disk. So is it ok to use EQUALLY big disk for parity? If I understand correctly the only condition for that would be to not have the disks 100% full? How much should be my cutoff if I'm using 3TB disks?
    Thanks!


    I ran for quite some time leaving 100GB free space my 3TB data disks when used with a 3TB parity disk and I didn't have any problems.



    I have since then upgraded the parity disk to 4TB so I don't need to worry about this anymore.


    Numerically, you should be OK if you don't fill up your data drives beyond 97%

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • I see. Going for 4TB parity with 3TB disks would be the safe bet. But then again, it's a waste of a whole TB... And either way I'd prefer to keep my disks at about ~80% full not more. I guess I'll just get the 4TB and not think about that wasted 1TB :P .


    If I go for 2 parity drives (which is what I ultimately want to do), are the 2 parity drives mirror images of each other, or do they contain separate data? Not that it makes any difference for me, jut trying to better understand how snapraid works.


    Thanks!

  • I'm not wasting that extra TB. I use the space to store stuff I don't need snapRAID to protect. Like my Plex Media Server Library, which is a poor candidate for use with snapRAID and fairly easy to regenerate if it gets damaged or lost.


    I'm fairly sure that the information held on two parity drives is different.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • I'm not wasting that extra TB. I use the space to store stuff I don't need snapRAID to protect. Like my Plex Media Server Library, which is a poor candidate for use with snapRAID and fairly easy to regenerate if it gets damaged or lost.


    I'm fairly sure that the information held on two parity drives is different.

    Oh, cool to know you can actually use that additional space. I thought the entire disk would be "dedicated" to being a parity disk.


    Why do you say plex library is a poor candidate to use with snapraid? I was planing to use snapraid on that specific library :P I know all can be downloaded, but I'll hopefully looking to have a >20TB library so it would be a pain to download it all over again...

  • Sure you can use the extra space. The snapRAID parity information is stored in a single huge file, not the entire partition.


    The Plex media Server Library is a database made up of many, many small files and directories. Depending on how large your media collection is, it could be hundreds of thousands, even millions of small files. And there is a lot of change going on in there. Combined, the high frequency of change and huge number of tiny files makes it a poor choice for snapRAID. snapRAID is more suited to sets of relatively large files that change little or not at all. This keeps the time required to perform the sync and resync down to reasonable times.


    Yes, recreating a Plex Library can take a long time, but this does not mean the the Plex server can't be used until the Library is fully rebuilt. You just won't have all the metadata available all at once, so things like cover art pictures, descriptions, actor lists, related movies, etc. will be filling in over time. The media list will be fully complete fairly quickly, and anything can be played once the server has the lists.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • Sure you can use the extra space. The snapRAID parity information is stored in a single huge file, not the entire partition.


    The Plex media Server Library is a database made up of many, many small files and directories. Depending on how large your media collection is, it could be hundreds of thousands, even millions of small files. And there is a lot of change going on in there. Combined, the high frequency of change and huge number of tiny files makes it a poor choice for snapRAID. snapRAID is more suited to sets of relatively large files that change little or not at all. This keeps the time required to perform the sync and resync down to reasonable times.


    Yes, recreating a Plex Library can take a long time, but this does not mean the the Plex server can't be used until the Library is fully rebuilt. You just won't have all the metadata available all at once, so things like cover art pictures, descriptions, actor lists, related movies, etc. will be filling in over time. The media list will be fully complete fairly quickly, and anything can be played once the server has the lists.

    Ah, I think we had a misunderstanding. You are talking about the plex METADATA library. I thought you were talking about the library itself (aka movies). Yes I totally agree with you.


    I was reconsidering the setup and had a thought. 1 parity disk allows for 1 disk failure, 2 parity for 2 disk failures. But what if the parity disk itself fails or gets corrupted? What happens then? I assume then I would just get a new HDD and rebuild the parity on it?

  • Yes, I was talking about the metadata being a poor choice for snapRaid. All my data (movies, etc.) is protected by snapRAID.


    If the parity drive fails or the parity file gets mangled, then you have no protection for your data until the parity drive is replaced and/or the parity info regenerated. If that were to happen here I would shut the machine down and leave it down for however long it took to get a new drive here - 2 days via Amazon Prime :)

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • Yes, I was talking about the metadata being a poor choice for snapRaid. All my data (movies, etc.) is protected by snapRAID.


    If the parity drive fails or the parity file gets mangled, then you have no protection for your data until the parity drive is replaced and/or the parity info regenerated. If that were to happen here I would shut the machine down and leave it down for however long it took to get a new drive here - 2 days via Amazon Prime :)

    I'm really glad I got introduced to SnapRAID, I really do think this is perfect for me :)


    Thanks for the explanation.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!