ZFS Hardware requirements

  • I am a bit confused right now regarding some use cases for ZFS (or BTRFS). So I hope some of you can give some input.


    It is about my small home use NAS. Asrock J5005, 8 GB RAM, 2x16TB HDD. Mainly used as SMB/CIFS/Nextcloud File Server and Plex Media Server for < 10 users.
    So far my storage is configured as mdadm stripe and I use some USB drives to backup. I'm also considering expanding it to a RAID5 with a third 16 TB HDD.


    Now, purely out of interest, I've been looking at the advantages of ZFS and wondering if it would make sense in my case? Especially the possibility to detect Silent Data Corruption and taking snapshots sounds quite good. However, some say that it causes a lot of load and needs lots of RAM. Others say this is more likely to play a role with high user numbers. What do you think?

    • Official Post

    It would work fine as long as you don't enable deduplication. zfs will use ram as cache if available. Just remember that if you create a three drive raid z1 (equivalent of mdadm raid 5) pool, that you will have to add three more drives to expand it.

    omv 7.4.7-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.14 | compose 7.2.3 | k8s 7.2.0-1 | cputemp 7.0.2 | mergerfs 7.0.5 | scripts 7.0.8


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

    • Official Post

    For two users I'm running an i3 with 12GB (not necessary - 8GB would be fine) with a Zmirror.


    While I'm using it for a backup, I'm running a second Zmirror with 4TB drives, using 4GB RAM and an Atom processor.
    As ryecoarron says, without dedup, hardware requirements are not as stiff as some claim they are.

  • It is same as with older systems, i have few OMV boxes with zfs and 2 or 4 GB Ram, and it works okay. If you use it for backup,and let's say not more than two docker containers(regular containers, not oRacle 12c EE :) ), you should be fine.

    • Official Post

    Don't you think ZFS may be some kind of overkill and I should better give SnapRAID (with EXT4 mdadm RAID5) a try?

    Personally, I would use a filesystem that you are comfortable with if things go badly. snapraid on mdadm would be weird unless you have a parity drive outside the array. I would think about whether you really need raid 5. snapraid with mergerfs to pool the drives is a very stable solution. Your availability is a little lower because you would have to restore the files using snapraid (or backup) if a drive failed. But all of your files on the non-failed drive would still be available.

    omv 7.4.7-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.14 | compose 7.2.3 | k8s 7.2.0-1 | cputemp 7.0.2 | mergerfs 7.0.5 | scripts 7.0.8


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Ok I have read on and on. SnapRAID is more of a backup solution that also offers hash based bit rot protection, but requires a parity disk anyway. Just like a RAID5 or RAIDZ it seems to me to be too much for my purpose. In any case I make a normal backup to an external HDD. Besides, the whole problem of bit rots has been relativized in my eyes. Rot bits actually occur astronomically rarely and every modern hard disk already has some precautions against data errors, such as ECC. So what added value does ZFS or BTRFS really bring to my application, especially since I don't even use ECC RAM? So it seems to make more sense to simply run a RAM check from time to time. I tend to just stick with my mdadm raid0 with EXT4 and make regular USB backups. Nevertheless I wanted to share my conclusion here, because I am open for contrary opinions.

    • Official Post

    Rot bits actually occur astronomically rarely

    OK, first realize that "astronomically" large numbers apply to modern day hard drives. Where talking about "terra-bytes" these days where 1 terra-byte is a Trillon bytes or 8 trillion bits. (Mathematicians, please chime in if I get the math wrong.) A good raw bit error rate is 1X10-6 or 1 in a million. Even with error correction, 1x10-6 can be improved to 1x10-8 or even better, maybe 1x10-9, but 1x10-9 is still 1 error in a billion. In 1 terra-byte of data, with a BER of 1x10-9, you'd be looking at 8000 bit errors.


    To reduce the chances of bit errors, there are robust error correction schemes in play but when we're talking about trillions of operations, "stuff happens". (Static discharges, unknown ground issues, power problems, etc., can create infrequent bursts of errors.) And it only takes one (1) weak CMOS gate or a slightly dirty power supply output to generate an error, on occasion, or even intermittant streams of errors. The sources of these issues can be difficult to impossible to find. Given the shear "astronomical" numbers involved and the complexity of modern day computers, "something" should be expected to go wrong at some point. It's just a matter of time.


    ECC RAM:
    ECC RAM keeps data in memory from being corrupted. A flipped bit in ram might not affect hard drive data at all. If the flipped bit is in an executable, it might cause a program error, a kernel panic or something like that. When data from a drive is read into ram, corruption of "read" data in ram is not of serious consequence. Silent corruption could only be a factor when corrupted data is written to the disk. While ECC RAM reduces the chance of data corruption and is important in 24x7 server operations, serious corruption of data on a hard disk is more likely than the occasional "bit flip" in ram.


    Hard drives:
    Compared to just 10 years ago, it should be noted that the areal densities of hard drive platters are off the charts. Disks are packing bits in tighter and tighter, increasing the chances of running into minute imperfections in the media. Of note, for a variety of reasons, magnetic bits can and do flip or simply go flat (no polarity). It happens and this type of corruption is silent. (Setting that aside.)
    Most people believe that hard drives fail all at once. That they work one day, and the next they don't. While a "light switch" like failure can happen, it's fairly rare. Magnetic media has a limited life (the platters), the mechanical parts begin to fail (motor, actuator and heads), and the interface electronics all are counting down toward eventual failure. The net result of failure is, typically, a hard drive that slowly and silently corrupts some of its content. Without some sort of detection scheme, the best that can be hoped for is "reallocated sectors" are blank areas on the disk, and that SMART notifications are set up so that the user is warned that the drive is beginning to die. If these warning mechanisms are not used or ignored, silently corrupted data may actually be backed up before the hard drive fails completely.



    Enter ZFS, BTRFS, SNAPRAID, etc.:
    These approaches use file check sums. When a check sum is calculated for a file, if a future calculation is done on the same file (in a scrub) and the checksum does not match, it's apparent that 1 or more bit errors have occurred. ZFS in a RAID 1 or 10 configuration replaces a corrupted file with a known good second file. BTRFS does something similar. SNAPRAID uses parity information and its' content file to recreate the uncorrupted original version of the file. The detection and (in some cases) correction of bit-rot is done in an operation called a "SCRUB". ((It's important to note that none of these detection/correction filesystems or schemes care about the low level details, or the mechanics of hard drives. They simply replace known corrupted files with verified clean versions.))


    So why is detection and correction important? If you have a slowly failing hard drive, SMART should detect and "eventually" reallocate a corrupted sector. (Hard drives have a few extra sectors reserved for this purpose.) The problem is, when the data from the bad sector is copied to a reserved good sector, there's no guarantee that the pre-existing data in the bad sector is not corrupt.
    Enter the "Scrub". When error detecting filesystems or SNAPRAID is used, the corrupt file is deleted (ZFS or BTRFS) and copied from a 2nd good file in a RAID1 "like" configuration or (SNAPRAID) recreated with parity data and a content file. In my experience, it is entirely possible that single bit-errors will be detected and corrected by a SCRUB operation before a bad sector is reallocated by SMART. This keeps data clean at the top level, on the primary server. When backing up, it's far better to backup known clean data that to take a chance that silently corrupted data is being replicated to a backup device.


    But there is a cost:
    In ZFS and BTRFS RAID1 type configurations, 50% of disk real-estate is lost. (ZFS works - tested.) It is claimed that BTRFS in RAID5 can recreate files from parity, but I have yet to actually verify bit-rot file recovery. (Eventually, I plan to test this.) In any case, 1 disk in 3 or more is lost to parity. SNAPRAID is similar and requires 1 disk (the largest of the disks) for parity, so bit-rot protection can be had for as little as 25% of a pool of protected disks. (SNAPRAID works - tested.)


    All of the above require a bit of reading and setup of necessary household chores, with the exception of ZFS. ZFS, when it's setup in OMV, will run 1 scrub a month automatically. Doing the same with BTRFS would require command line scrub operations, but they can be setup in scheduled tasks and run from there. Most SNAPRAID housekeeping can be run manually in the GUI.


    SNAPRAID has many features in addition to bit-rot detection and correction. As it is with most RAID configurations, SNAPRAID can recreate entire hard drives but it goes beyond that with file and folder restoration capabilities.
    In the way of truly advanced features, ZFS and BTRFS can utilize "SNAPSHOTS" for retrieving past "versions" of a file as it existed in different points of time, or even restore an entire filesystem to an earlier point in time. (As an example, a filesystem can be rolled back to a time "before" a ransomware infection.)


    It all boils down to whether or not you feel your data is worth protecting. If it's a bunch of video files that you don't care about, probably not. If it's a collection of irreplaceable family photos, you might consider bit-rot protection AND backup. Also, as a real consideration, is the reading and additional complexity involved.
    _________________________________________________________


    Here's the bottom line (again - my opinion):
    Unless it's used with a gaming machine where disk I/O performance is the only consideration, I wouldn't use RAID0, *ever*. RAID0 is a ticking bomb. If either of your two disks fail, you lose it all. RAID0 is not appropriate for a NAS. If you don't want to spend the money on an extra drive for parity, I'd use the UnionFS plugin to pool your two drives together. In that way, if one disk fails, you'd still have the data on the 2nd disk. There's a performance hit but, for a home server, it shouldn't be a problem.


    Again, these are mostly opinions. Others may have opinions and information more applicable to your use case.

  • Thanks for your reply! This helps me to get a clearer picture of the whole thing. Yeah,the numbers of bits we're talking about on hard drives are already astronomical. I agree with you there. Nevertheless, it often sounds as if one is completely defenseless without the Bitrot Protection. It is hardly ever pointed out that every modern hard disk has its own protection mechanisms. I am not saying that Bitrot Protection is useless. I am just wondering how much this extra protection brings in terms of results and cost/benefit for my small application. If I understand it correctly, ZFS/BTRFS/SnapRAID will report when a data error has been detected and corrected. Maybe you have some experience in this area? How often has this happened with which setup and in which time span?


    A next point is that it is often suggested that a file with a tilted bit is no longer usable junk. But honestly, what happens if after years in a 5 GB film, for example, a single bit is tilted? In all probability nothing. With less probability you have a short picture error. It is very unlikely that the film is no longer playable. But it is possible. It is also similar with pictures. As a result, I would like to get a feeling for how likely it is that in 10 years in my collection of 20,000 pictures, only one of them will actually be scrapped because there was no Bitrot protection.


    I see the problem that an unknowingly corrupt file propagates through backups and in the end every backup of the file is corrupt. Maybe there is a way to compare checksums between NAS and backup? That would probably take forever and put a strain on the disks. Or is there the possibility to use ZFS or BTRFS without parity drive to be easily informed when a bit error is detected? After all, the snapshot function also speaks very much for one of the file systems. I left that out of consideration in my last post. But I am still very sceptical about buying a parity drive for several hundred EUR.


    Let's come to RAID0. You describe this as a ticking time bomb, doing exactly what brings me and, in my opinion, many others to a distorted picture of actual probabilities of data loss. It is undisputed that the failure probability of a RAID0 is much higher. But you ignore the fact that if one of the hard disks fails, only that one disk is defective. So the probability of a single defective disk does not change at all. And my data is not irretrievably lost. I just can't read data from the RAID anymore. That is clear. Therefore I have backups. I have to restore about twice as much data as without RAID0. The bottom line for me is that using a RAID0 reduces the average availability of my data due to the higher probability of failure. But the risk of data loss is only slightly increased as long as the backup strategy is okay. In my case about 10 TB of less important data are simply backed up on external USB disks. 4 TB of more important data is doubled and the 1 TB of really important personal data is on 3 external disks and additionally encrypted at the cloud provider I trust. I can't see the ticking time bomb that is invoked on all sides.


    Ticking time bomb vs. comprehensive and final security. That's what it sounds like when you read through many posts in many forums. I question that just because such tendentious statements are mainly good for convincing gullible people. A reliable picture of the risk of a data loss is hardly possible. But that is what the questioner is actually interested in and what I am interested in.


    Looking forward to your answer!

  • Quote from HannesJo

    Maybe you have some experience in this area? How often has this happened with which setup and in which time span?

    I use ZFS and for new disk is normal that do not show errors unless some hardware have problems ( I hava SATA cable errors and SATA enclosure errors), so you can expect years of none errors in normal conditions


    a month resilver is a good recomendation to check integrity at least once a month.

  • I use ZFS and for new disk is normal that do not show errors unless some hardware have problems ( I hava SATA cable errors and SATA enclosure errors), so you can expect years of none errors in normal conditions
    a month resilver is a good recomendation to check integrity at least once a month.


    Ok but resilvering means I would need a parity disk, right? Is ZFS/BTRFS able to just let me know when there is an error and which disk and file is concerned? I could recover it from a backup then.


    Another idea would be to compare the files checksums on the NAS with the backup files using rsync -acnR every few months in addition to the normal incremental backups. However, if there is a discrepancy, I don't know which of the files is damaged. I would have to find that out myself.


    For professional use the idea is of course nonsense. But for my small home NAS it might make more sense than buying a parity disk for 400 Euro. And let's say in one year there is a defect in a RAM. Then the data that was written to the HDDs afterwards can't be trusted anymore anyway. Then the parity disk has also brought nothing. I could invest thousands of Euros to protect myself against all this. And even then there is still a residual risk.


    I would like to assess the risks of data loss as objectively as possible and reduce them as far as possible. If there are convincing reasons to spend more money on this in one place, I will gladly do so. However, I would like to base this on cost/risk considerations and not on hearsay.

  • Ok but resilvering means I would need a parity disk, right? Is ZFS/BTRFS able to just let me know when there is an error and which disk and file is concerned? I could recover it from a backup then.

    Yes, ZFS detect and correct itself the error for you, unless that are not recoverable, in that case ZFS tell you what file is damaged or if the entire disk need to be replaced:
    eg: https://www.ixsystems.com/comm…have-been-detected.81014/


    https://docs.oracle.com/cd/E23…/html/819-5461/gbbwa.html


    https://docs.oracle.com/cd/E23…819-5461/gbbvf.html#gbcus

    • Official Post

    If I understand it correctly, ZFS/BTRFS/SnapRAID will report when a data error has been detected and corrected. Maybe you have some experience in this area?

    Yes, I've seen error detections with SNAPRAID. (Corrections, in SNAPRAID require a separate follow up operation.
    In the SNAPRAID case, I set up a backup server that, as of day 1, had very old disks (around 5 years old), just to see what would happen. It's an anecdotal test to be sure, but I have seen error corrections without a drive failure. I have yet to see ZFS corrections but I started that array with new disks. (The idea is to avoid issues at the top level.)


    It is hardly ever pointed out that every modern hard disk has its own protection mechanisms.

    Again, going back to BER rates, error corrections schemes, etc.:
    Since we are dealing with astronomically large numbers, without the built in error correction schemes modern day hard drives would be useless. *Remember 1 error in a million is far too many.* But, in the final analysis, the built in correction is still not enough. Cow filesystems and SNAPRAID operate on top of hardware as a last resort, verifying and maintaining clean data, while fixing errors that, with enough time, are inevitable.


    As a result, I would like to get a feeling for how likely it is that in 10 years in my collection of 20,000 pictures, only one of them will actually be scrapped because there was no Bitrot protection.

    in the basic concept, this is correct. In my understanding of the structure of some file types; a single bit error (bit-rot) might manifest, in a picture, as a color change in a pixel. If it's an ASCII file, a single letter may change. Such events, really, wouldn't be a big deal but it depends on where the error land. Again, this is a "silent" problem and time is not your friend. The reason I take it seriously is that I have files that go back to Windows 3.1 That's a long time.
    There are many out there that have had files that they though were good, and were backed up religiously. When actually checked, they found that their files were nothing but gibberish, merely trash data with a file name. The condition of files on a hard drive is an unknown and remains unknown without something that tracks their initial state, and checks integrity from time to time.


    I'm interested in bit-rot for two reasons.
    1. Keeping data clean at the highest level (my primary server) so that backups to secondary devices are also clean.
    2. Pre-warning before a hard drive failure. If I see misc error corrections I would; (1) Stop replication and check my backups, preserving back up files in a known good state. (2) Eyeball SMART data and start running long tests to see if a drive is getting weak. (3) Depending on the findings, put a drive on order.
    The overall intent is keeping data clean while insuring longevity and survival.


    Maybe there is a way to compare checksums between NAS and backup?

    There are enterprise filesystems that support this, clustering file systems, maybe CephFS, but the implementation of them may not even be possible using the hardware of the typical home enthusiast. That says nothing about the learning curve involved. The details of setting up and operating these filesystems may not be easy; certainly not as easy as using the plugin's for ZFS or SNAPRAID, within OMV's GUI.



    But you ignore the fact that if one of the hard disks fails, only that one disk is defective. So the probability of a single defective disk does not change at all. And my data is not irretrievably lost. I just can't read data from the RAID anymore.

    I'd recommend reading up on RAID0. If one disk is dead, the data on both is gone. Data is "striped" across two disks, so nothing can be salvaged. That means, the weakest of the two disks will decide how long the array lasts.


    A reliable picture of the risk of a data loss is hardly possible.


    That's because storage technology is moving so fast, research on reliability and data loss doesn't exist for the latest tech. Along those lines, the backblaze study is the best data set and summary info available on the topic of consumer hard drive failure that I've found. Further, over the years, data center site admin's have developed known good strategies for insuring the continuity of customer data which is always (always) based in a good backup plan.
    A solid backup plan has multiple copies of the data set, with (preferably) one copy off-site, and "versioning" which allows admin's to go back in time to previous data states.


    Ticking time bomb vs. comprehensive and final security. That's what it sounds like when you read through many posts in many forums. I question that just because such tendentious statements are mainly good for convincing gullible people.

    I have to say, I find this amusing. On the forum, I've found a few types of new users; the inexperienced type, that experienced forum members try to guide, and there are the hardheaded type that learn things the hard way. I'm fine with both :) because they'll be living with the consequences of their choices. Again, we're talking about matters of opinion here, but I have enough experience to where I have a few notions of what works and what doesn't.


    When it comes to RAID, there are people who have been "lucky" and never had a failure. Maybe it's because they update their hardware and/or drives before a drive failure. There's numerous factors involved. And there are people who have had RAID failures, many without back up. You'll find those horror stories a-plenty if you search the forum using the right terms. Usually, they were RAID5 victims, using a RAID array without really understanding what it is, without setting up SMART notifications, etc. RAID0 is another matter altogether, it's in a class of its own.


    I call RAID0 the "insane array". Can you use it? Sure. One can take a great many risks with a solid backup plan, but that still won't change the fact that RAID0 is not suited for servers and (in my opinion) is an unnecessary risk. Nearly anyone on this forum, with experience, would tell you that. It has nothing to do with gullible users.

  • I think you misunderstood me in my basic criticism of reasoning. You don't have to explain the text about BER and astronomical data sets to me again. I had already agreed with you. I have not questioned a fundamental added value of the Bitrot Protection.


    You also misunderstood my explanation of RAID0. Of course, in case of a defect the data is no longer readable from the entire RAID. But using a suitable backup strategy I did not lose any data in the end. And you need the backup strategy anyway. A RAID can not replace it. Your representations are of course factually correct. But you create the image of an enormous risk of irretrievably lost data. And this does not happen if you use RAID0, but only if you don't have a suitable backup strategy. Of course I risk a bad availability with RAID0. Very important in the professional area. At my home less so.


    The fact that you find my criticism of tendentious, but factually poor argumentation amusing and that you divide people into these two groups here in the forum says nothing about me, but about you. As an electrical engineer I am a layman in the field of NAS/Storage/Servers, but I know exactly at what kind of argumentation I have to ask again in order to have an objective picture of my problem at the end of the story and not to make decisions because everyone says something is a ticking time bomb. But at least some points I was looking for were in your answers.


    I will probably resolve the RAID0, even if it is not a "ticking time bomb" for me, but probably just not worth it. I still don't see a parity drive as justified for my use case. But I will play around with ZFS and BTRFS just because of the bitrot detection and the snapshot functions. Thanks!

    • Official Post

    I think you misunderstood me in my basic criticism of reasoning.

    Maybe - it wouldn't be the first time. :)
    _______________________________


    It should be noted that many OMV users do not have backup, and I believe it's higher than 50% . While this is your thread, others will read it. Some of what is written is intended for them, in the hope that readers won't take unnecessary risks AND will maintain solid back up so they don't end up back here. Some come back anyway, with dead or dying drives and no backup. Here's an example. At this time, it's at 10 forum pages and still going.
    _______________________________


    If you go the ZFS route, for bit-rot, use a "zmirror" which is RAID1 and take a look at setting up automated SNAPSHOT's. It's a great feature of the filesystem. zfs-auto-snapshot is really nothing more than shell scripts and cron jobs but that's a good thing. It upgrades well - there are no dependencies so there's little to nothing to break.


    BTRFS can do much of the same, but I don't know the in's and out's. I'm waiting for it to become a bit more stable before diving into "RAID1 like" mirror implementations for bit-rot protection. ((Some say it will scrub bit-rot in RAID5 but I'd have to thoroughly test that before I believe it.))

  • It should be noted that many OMV users do not have backup, and I believe it's higher than 50% . While this is your thread, others will read it. Some of what is written is intended for them, in the hope that readers won't take unnecessary risks AND will maintain solid back up so they don't end up back here. Some come back anyway, with dead or dying drives and no backup. Here's an example. At this time, it's at 10 forum pages and still going.


    LOL okay, I see. Just read through some of those threads. Maybe the problem is exactly what I mentioned. Everybody is preaching how great this and that RAID + ZFS/BTRFS helps to protect your data. In contrast to this, it is rarely said "Take care of a backup, then we will talk about RAID".


    I have now set up a virtual OMV and play a bit with ZFS and BTRFS. Thanks for the nice discussion!

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!