Why I'm going to use btrfs

  • Hi all,


    after I tried out actually every omv-supported file systems able to handle more than one HDDs in a single volume, from my humble opinion there remain only two file systems, that are reasonable so far: ZFS, the non-Linux mother and btrfs, her real Linux child.


    I will use btrfs in future, as it supports an implicit RAID5 or RAID6 like ZFS. And even if already hear the cry: "But the write hole...." Sorry, but the write hole is an issue not only of btrfs, but of ZFS as well, and in fact of *every* RAID (http://www.raid-recovery-guide.com/raid5-write-ole.aspx) Think it's not really difficult to understand: *Any* redundant saving needs at least two writes to disk to be fulfilled. Which must occure at exatcly at the same time at the platters involved, if a sudden power outage shoudl do no harm. But which OS existing can guarantee this? And even if the OS would do it - how can we be sure, that our disks write their buffered date at the same time? Or at least in the time, they still have, to do anything, if power suddenly fails?


    So there is a write hole not only by theory, but by hardware. And there is only one solution: a UPS able to let the drives write all data they got before shut down. With today's drives buffering 100MB or more we even cannot rely on BBUs on drive controllers no more. They only can save data not sent to the drives yet. But they cannot recover data, that are sent to the drive's buffers already.


    So what at all - to be safe, for every setup with >1 drives driven parallely we need uninterrupted power, as long as the writes last. And a UPS for the whole system is the only solution against write holes. Regardless, which file system and drive pool organisation we use.


    Btw: Btrfs is used quite normally for RAID5 and 6 setups by Synology and Rockstor. Both recommend to use a UPS. Of course.


    Btw2: I actually don't beleive, that the write hole ever can be closed. Neither in btrfs, nor in any other file system. Just think, this would need multiple disc writes and confirmation at the same time. I don't beleive this possible with a power fail inbetween.



    Thanks for reading my thoughts and


    Best regards,
    Der Jopa

    Private NAS: Asus P9D-I + Xeon E3-1220L CPU + 8GB ECC RAM + LSI SAS2108 HBA + 4x8TB BTRFS RAID 10 + OMV 4 booting from single SLC SSD
    Private experimental NAS: Asus P9D-I + i3-4130T CPU + 8GB ECC RAM + 3x10 BTRFS RAID5 + OMV 5 beta booting from single SLC SSD

  • So what at all - to be safe, for every setup with >1 drives driven parallely we need uninterrupted power, as long as the writes last

    Nope, since the write hole is only a thing with parity RAID (be it mdraid, RAIDz or btrfs' raid56). With mirrors it's not an issue and with storage topologies that allow to combine a bunch of mirrors (see here) this works really well. It also only affects operation in degraded mode so you 'only' need to fear power losses when you already lost a disk.

    I actually don't beleive, that the write hole ever can be closed

    That's what a log device is for (an SSD with a super capacitor). Fixed in mdraid since years and about to be fixed with btrfs raid56 sometimes in the future.

  • Hi Thomas,


    sorry, the link I included was bad - the "h" of "hole" was cut out somehow. So please look here and tell me, if you think, the author of this article (and coder of a disaster recovery software) did not understand the write hole issue...


    The articles on jr-s.net are quite informative - not only the one you linked - and I'll give them to my co-admin at company.


    BTW: I actually fear power losses at company, and some four or five years ago urged deciders to buy an UPS at least for our servers. Few month later a sudden outage killed the phone system, but not the server integrity. After power was back, most collegues could continue their work from local backups of open/libre office without any help. I only had to clean up the Windows Server for the ERP, because it had blocked a client, that was writing data, when outage occured. I don't even want to think about, what could have happen, if the Win Server had lost power as well.


    Best regards

    Private NAS: Asus P9D-I + Xeon E3-1220L CPU + 8GB ECC RAM + LSI SAS2108 HBA + 4x8TB BTRFS RAID 10 + OMV 4 booting from single SLC SSD
    Private experimental NAS: Asus P9D-I + i3-4130T CPU + 8GB ECC RAM + 3x10 BTRFS RAID5 + OMV 5 beta booting from single SLC SSD

  • So please look here and tell me, if you think, the author of this article (and coder of a disaster recovery software) did not understand the write hole issue...

    The write hole is explained properly but he only talks about primitive/anachronistic mirrors (like mdraid's raid1) which no-one right in his mind should use any more. With a btrfs raid1 or a zMirror this won't happen since checksums are there and the filesystems are CoW (copy on write). So even if one disk does not contain all data it's no problem since both filesystem and data will be intact. But this requires correct write barrier semantics working (see the Github issue I linked to above).

  • Hi Thomas,


    thank you for your replies, helping me to understand write hole issue another bit deeperly.


    Remains just one thing re this forum: How can I mark my thread as solved by my own, or must a moderator do it?


    Best regards

    Private NAS: Asus P9D-I + Xeon E3-1220L CPU + 8GB ECC RAM + LSI SAS2108 HBA + 4x8TB BTRFS RAID 10 + OMV 4 booting from single SLC SSD
    Private experimental NAS: Asus P9D-I + i3-4130T CPU + 8GB ECC RAM + 3x10 BTRFS RAID5 + OMV 5 beta booting from single SLC SSD


  • Maybe read your own link?


    "One more option to avoid a write hole id to use a ZFS which is a hybrid of a filesystem and a RAID. ZFS uses "copy-on-write" to provide write atomicity. However, this technology requires a special type of RAID (RAID-Z) which cannot be reduced to a combination of common RAID types (RAID 0, RAID 1, or RAID 5)."



    And tbh it's harder or almost impossible in my experience to kill a RaidZ compare to a btrfs raid 5/6.



    Try the same with btrfs.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!