Which energy efficient ARM platform to choose?

dkxls · 19. März 2019

Hello Everyone,

I am new to the game, but I have been reading up the various alternatives for a DIY NAS (in particular this thread and comments by @tkaiser in various forums all over the net).

What I figured so far (mostly because @tkaiser and other people that seem to know very well what they talk about stresse these points relentlessly) is:
1. Do not use 'classical' RAID configurations in a home/small-business setup ever (hardware or software)
2. Use a checksummed file system (i.e. btrfs or ZFS - I don't think there are much alternatives, bcachefs maybe?)
3. Use ECC RAM, if possible.
4. Avoid hard drives connected via USB, and preferably use 'real' SATA connections.

Now, my list of requirements (or preferences) for a NAS is:
a) Small form factor (and preferably somewhat decently looking, so I can place it in the living room)
b) Low energy consumption as the thing should run 24/7
c) Silent (preferably passive cooling)

Combining these two lists, leaves me essentially with the two options @ekent already mentioned:
I. RockPro64 with the NAS case and a decent PCIe SATA adaptor (i.e. using either a Marvell 88SE9235 or ASMedia ASM1062 chip, where the latter is much cheaper at least when ordering from Australia)
II. Helios4

My use case for this is a NAS for home environment where I keep pictures, documents and stuff. I don't really have movies or music anymore (spotify and netflix does the trick). Also, I would like to use it as a backup location for my laptop, using restic, which in itself shares many traits with those next-generation file systems. Last, I plan to backup the NAS to a cloud storage, again using restic (or something similar that integrates better with the filesystem of the NAS). Thus, I will probably end up having one or two 2TB HDD in the NAS (kind of depending on whether I should keep a local backup or not).
Last, I should mention that I am not the biggest fan of btrfs. I have used it around 6 years ago on various workstations and only had trouble with that setup. (Back then OpenSUSE recommended btrfs in the installer). The issue was somehow related to the snapshots btrfs automatically did and eventually ran out of disk space in /var. As this left my systems various times in an unusable, though recoverable, state, I switched back to ext4.

My questions/concerns on all this are:
- Can I use ZFS with either one of the boards mentioned above? I know that the Helios4 does not have the most powerful CPU (and it is 32bit), but there are apparently people that got ZFS running on a Raspberry Pi or Rock64 (see e.g. [1,2,3]). The RockPro64 has a much more powerful 64bit CPU and twice the RAM (4GB vs 2GB for the Helios4), but I am still unclear if that would be enough for ZFS. Obviously, a downside of the RockPro64 is that the RAM is non-ECC.
- Thus, my 2nd question: In case ZFS can be used on these boards, should I chose ECC RAM over a more powerful CPU?
- Is btrfs usable (or even recommended) for a NAS nowadays? (As said, my experience with btrfs hasn't been the best a few years back.)
- I am still unsure of whether adding a 2nd HDD and setting up a ZFS/btrfs mirror is a good idea or just waste of an HDD. Any recommendations?
- Last, which board to chose, RockPro64 or Helios4 or something else?

I am aware that ZFS on Linux, and especially on OMV, might mean that some tinkering is required. I don't see this as a problem, and I am fairly well accustomed to a Linux environment.

Any hint or further insight, especially regarding ZFS on these ARM boards, is much appreciated!

Cheers,
Armin

[1] https://www.raspberrypi.org/forums/viewtopic.php?t=165247
[2] https://icicimov.github.io/blo…ZFS-NAS-ROCK64-NFS-Samba/
[3] https://forum.armbian.com/topi…ab=comments#comment-53681

tkaiser · 19. März 2019

Zitat von dkxls

Any hint or further insight, especially regarding ZFS on these ARM boards, is much appreciated!

With ZFS exactly the same is important as with btrfs: you need storage that correctly implements flush/write barriers which can be a real challenge when running on SBC since the majority of them and especially the most popular one (Raspberry Pi) is 'most crappy storage setup possible' by definition.

SBC with a quirky USB controller combined with random external USB storage are a great recipe for silent data corruption with ancient filesystems like ext4 or total data loss with modern filesystems like ZFS or btrfs (users lost their ZFS pools years ago for the same reasons they still loose their btrfs filesystems). But with PCIe attached SATA and the controllers you mentioned or native SATA on the Helios4 you shouldn't be affected by host controller issues and when you use modern drives using both filesystems shouldn't be an issue.

Neither filesystem needs ECC RAM (the 'scrub of death' is a myth) but if you love your data you better use it. Wrt amount of RAM needed there's confusion with ZFS (another myth talking about 1GB RAM per TB usable storage -- such formulas only apply if you want to use deduplication) but once you set the ARC to a reasonable minimum even 1 GB RAM should be fine for normal operation. Things might be different when running scrubs or resilver (if you opt for a zmirror) and this is something that needs testing especially on usual SBC with their small amounts of DRAM.

But the latter is even true for ext4 at least when tons of hardlinks are involved (rsnapshot/backup target) since then e2fsck might need huge amounts of memory and checking/repairing an ext4 filesystem runs in out of memory errors or requires a huge swap partition and takes ages.

Now talking about software support: for 'obvious reasons' ZFS is not part of the Linux kernel and as such you're either on your own building kernel/modules yourself maybe automating this with DKMS or you choose distros that takes care of this. With OMV on x86 there's the 'proxmox kernel' for this purpose (though no idea whether this works 100% flawlessly all the time), with ARM devices you're on your own. You're clearly entering expert area and need to double check with each and every kernel update whether your ZFS pools will be accessible afterwards or not (and with ARM there's no GRUB fallback to latest known working kernel too).

For the latter reason I prefer to use btrfs on ARM, taking care to use at least kernel 4.4 and with more sophisticated features at least kernel 4.19 (then also being prepared for potential OMV integration issues and hoping for btrfs becoming a first class citizen in OMV5 and above).

Adoby · 19. März 2019

I'm sticking to my HC2s with huge hdds and nfs/ext4. Has worked fine so far. But I'm also experimenting with a backup utility that can provide automatic bitrot detection and correction on ext4.

dkxls · 21. März 2019

Zitat von tkaiser

Neither filesystem needs ECC RAM (the 'scrub of death' is a myth) but if you love your data you better use it. Wrt amount of RAM needed there's confusion with ZFS (another myth talking about 1GB RAM per TB usable storage -- such formulas only apply if you want to use deduplication) but once you set the ARC to a reasonable minimum even 1 GB RAM should be fine for normal operation. Things might be different when running scrubs or resilver (if you opt for a zmirror) and this is something that needs testing especially on usual SBC with their small amounts of DRAM.

But the latter is even true for ext4 at least when tons of hardlinks are involved (rsnapshot/backup target) since then e2fsck might need huge amounts of memory and checking/repairing an ext4 filesystem runs in out of memory errors or requires a huge swap partition and takes ages.

@tkaiser Thanks a lot for the detailed explanations! This is very interesting and informative to get such insight.

So, what I am not yet completely understanding is why I should use ECC RAM. I have read now so many times the statement "if you love your data, use ECC RAM", which simply quotes Matthew Ahrens when talking about ECC memory for ZFS. While that statement is true, it is true for any file system and only refers to the fact that data can be corrupted when there is an undetected error in the RAM.

What I figured is that these very abstruse scenarios in which people depict the complete loss of all data on a system using ZFS due to an undetected memory error (e.g. in the FreeNAS forum) are just bogus. But also the scenario of a partial data loss where the same file would be loaded over and over again into the same faulty memory location (see e.g. ZFS on Debian guide) is just unlikely (simply because that is not how memory allocation, at least under Linux, works) and in no way worse than on any other file system. Also, as already mentioned by @tkaiser, the myth that ECC would be strict a requirement for ZFS has been debunked by several people that explain this whole issue in a sane way, see for instance: ZFS and non-ECC RAM and the video on Why Scrubbing ZFS Without ECC RAM Probably Won't Corrupt Everything.

So, from what I understand, there is always a risk that a corrupted file gets written to disk due to an undetected error in the RAM (irrespective of the file system), and this holds true for any file that does not yet exist on the file system (i.e. be it a new file are a file that's opened, modified and then written back to disk). In fact, in case of a NAS system, this memory error doesn't even need to be on the server, it may as well be on the client side - either way a corrupted file will be written to disk and ZFS has no way of detecting this. Now, ECC memory detects and corrects for some of these memory errors and thus is a very good thing to have as it mitigates the risk of corrupted data significantly. However, once a correct file is written to a check-summed file system (e.g. ZFS or btrfs), there is no way a memory error will corrupt that file (as outlined in the links above).

I guess the issue boils down to evaluating the risk of have some faulty data (i.e. a number of files, but not the entire dataset) stored on disk due to a memory error.

I am currently leaning towards the Helios4 option (due to the ECC RAM and the fact that the whole SoC is designed for the NAS use-case), but I am afraid I might not be able to use ZFS on that one. I could not find a single mention of ZFS on a Helios4, or any Marvell ARMADA 38x SoC for that matter, anywhere on the net.

An interesting third alternative is MACCHIATObin board, which can be used with ECC RAM, but doesn't seem to sold with ECC RAM by default (at least it's nowhere mentioned) and is significantly more expensive.

dkxls · 21. März 2019

Zitat von Adoby

But I'm also experimenting with a backup utility that can provide automatic bitrot detection and correction on ext4.

I have looked into that topic as well and found that there are basically two options:
1) restic
2) BorgBackup

Both of them seem to be doing an excellent job and are very similar in the way they work. I went with restic (though not yet in a full scale), which was more like a gut feeling rather than hard facts. Probably the fact that the original author of restic has done such an excellent job in designing the tool, and seems generally very conscious about the design decisions. Also the fact that restic borrows many ideas from git, which is a tool I use excessively everyday, has influenced my decision.

Adoby · 21. März 2019

I took a quick look at the documentation for both. I couldn't see anything about bitrot detection and correction?

dkxls · 21. März 2019

Zitat von Adoby

I took a quick look at the documentation for both. I couldn't see anything about bitrot detection and correction?

OK, I think I know what you mean. So, no, from what I understand restic would not protect you from bitrot (i.e. a bit flip) of the original data, i.e. not the data in the restic repository. See this issue for details: https://github.com/restic/restic/issues/805

However, the data stored in a restic repository itself is encrypted and signed. You should just be able to detect bitrot of the backup data easily when checking the repository. To correct for that, you would then obviously need some kind of data redundancy. This is discussed here https://github.com/restic/restic/issues/256, but obviously if your repository is stored on a checksummed file system with some kind of redundancy you could correct that as well.

@Adoby: Do you have any better suggestion for a backup software?

tkaiser · 21. März 2019

Zitat von dkxls

So, what I am not yet completely understanding is why I should use ECC RAM

Since bit flips happen. We have a lot of servers in our monitoring and some of those with ECC RAM report correctable single bit errors from time to time -- even those that survived 72 hours memtest burn-in. Those without ECC ram have no ability to report bit flips and as such we can only speculate what happens there (if an application crashed for example).

The consequences of a bit flip range from 'nothing at all' over silent data corruption to application or kernel crash.

Nothing at all if the flipped bit got detected and corrected by higher layers (nearly every protocol uses CRC mechanisms to detect corrupted data since hardware is not perfect)
Silent data corruption happens. Users without checksumming filesystems like ZFS or btrfs maybe will never know or only if it's way too late, with those modern fs the next scrub will tell (and if redundancy was used it will be automatically corrected)
Application or kernel crash is as bad as it sounds. So ECC RAM is also an investment in 'business continuity' or availability. And such crashes often are accompanied by silent data corruption or even data loss as well

Zitat von dkxls

I could not find a single mention of ZFS on a Helios4, or any Marvell ARMADA 38x SoC for that matter, anywhere on the net

I did some tests with a RAIDz on my Clearfog Pro (with just 1GB) back in 2017. Rather underwhelming especially since my usual ZFS setups are big Xeon boxes with SAS disks and enterprise SSDs for ZIL and L2ARC.

With a Helios4, 4 SATA disks and data integrity + 'self healing' in mind I would most probably choose a btrfs raid1 with 2 to 4 disks. Please note that the btrfs raid1 mode works totally different compared to both mdraid1 and a zmirror: https://github.com/openmediava…01#issuecomment-466901395

Adoby · 21. März 2019

Zitat von dkxls

@Adoby: Do you have any better suggestion for a backup software?

No, not today.

I would like some software that creates backups and that at the same time creates checksums of the files. When the backups are updated the old unchanged files are checked and if any no longer match the checksum, it is automatically replaced if possible. Either the original file is replaced by the backup file, or the other way around. However this would be extremely computionally intensive, so the checks would likely have to be partial. Say you daily check 4% of the files each day so that all files are checked at least once per month. This should greatly reduce the risk of bitrot in sets of files that don't change often. Say collections of scanned family photos or private recordings. It would be less efficient for big sets of files that change and are replaced often.

Filesystems that create checksums can help a lot. But to provide enough redundancy you need to use something like RAID, and then you still need backups, so you look at keeping at least three copies of each file. And these filesystems creates checksums on the fly, which may slow down the filesystem noticeably in some circumstances.

Modern hdd/ssd firmware already correct many errors and ECC memory can also help.

But I still would like extra protection, as I described it above. Storing data on small power efficient SBCs with fast networking and SATA drives makes it possible to distribute the work somewhat. For instance a backup storage SBC with a huge hdd can perhaps daily check more than 4% locally stored backup files without it having any detrimental effect on the data storage SBC that serve the original files. And the data storage SBC only check 2% during the early morning hours. If the system could use backups on the LAN then that would provide enough redundancy to fix errors detected while checking the files. And two copies of the files, together with checksums, would be enough to provide fully automated bitrot protection.

As far as I know nothing like this does exist today. At least not for home users that worry about bitrot eating their scanned family photos.

Not yet.

tkaiser · 21. März 2019

Zitat von tkaiser

With a Helios4, 4 SATA disks and data integrity + 'self healing' in mind I would most probably choose a btrfs raid1 with 2 to 4 disks

Disclaimer: I personally try to keep storage pools as small as possible (maybe because I deal with storage for a living and am constantly confronted with the downsides of really large storage setups like an fsck or RAID rebuild taking days or even weeks and such stuff).

With btrfs and 32-bit systems like the Helios4 there seems to be a 8TiB limitation and in general ARM devices with their low amounts of DRAM can run into trouble when dealing with (checking/repairing) huge storage setups.

tkaiser · 21. März 2019

Zitat von Adoby

Modern hdd/ssd firmware already correct many errors and ECC memory can also help

But this is more or less unrelated to the 'bit rot' problem since ECC used inside storage entities or ECC RAM only try to ensure that a 1 is a 1 and a 0 is a 0 now. Data degradation on (offline) storage is not addressed here at all.

IMO you should really have a look at btrfs. Set up 2 HC2 with btrfs, use one for the productive data and the other for backups. Then use btrbk to do the backup thing (creating/maintaining/deleting snapshots at source and destination and transferring them efficiently via btrfs send/receive). Periodic scrubs at both HC2 will reveal corrupted data that then needs to be transferred manually from either system to the other.

Adoby · 21. März 2019

Zitat von tkaiser

Disclaimer: I personally try to keep storage pools as small as possible (maybe because I deal with storage for a living and am constantly confronted with the downsides of really large storage setups like an fsck or RAID rebuild taking days or even weeks and such stuff).

With btrfs and 32-bit systems like the Helios4 there seems to be a 8TiB limitation and in general ARM devices with their low amounts of DRAM can run into trouble when dealing with (checking/repairing) huge storage setups.

Yes, running fsck on a 12TB ext4 HDD with a lot of rsync snapshots with lots of hardlinks, is not fun on a HC2... I tried. Once. Then I removed the drive and stuck it in my desktop Linux PC. It was done the next morning. Or you can reformat and restore from backups. If you have them...

Luckily this don't happen often.

dkxls · 21. März 2019

Zitat von Adoby

I would like some software that creates backups and that at the same time creates checksums of the files. When the backups are updated the old unchanged files are checked and if any no longer match the checksum, it is automatically replaced if possible. Either the original file is replaced by the backup file, or the other way around. However this would be extremely computionally intensive, so the checks would likely have to be partial. Say you daily check 4% of the files each day so that all files are checked at least once per month. This should greatly reduce the risk of bitrot in sets of files that don't change often. Say collections of scanned family photos or private recordings. It would be less efficient for big sets of files that change and are replaced often.

OK, I think restic is almost there already. Obviously, some key components to protect from bitrot are missing, but I reckon the largest building blocks are there. In particular, since restic is based on calculating sha256 checksums of all data that is backed-up for deduplication. So it knows the checksums of the original data, but on the next backup, it skips all files for which the mod time didn't change (again, more details in this github issue).

Adoby · 21. März 2019

Well, it seems that currently, restic does not provide any bitrot detection or protection at all? Apart from that it looks like a really nice backup system.
I currently use plain old versioned rsync snapshots between SBC servers in a LAN. Simple and easy. I would prefer something very similar, but with added fully automatic bitrot protection.

dkxls · 21. März 2019

Zitat von tkaiser

Since bit flips happen. We have a lot of servers in our monitoring and some of those with ECC RAM report correctable single bit errors from time to time -- even those that survived 72 hours memtest burn-in. Those without ECC ram have no ability to report bit flips and as such we can only speculate what happens there (if an application crashed for example).

Thanks again for the elaborate explanations, they are much appreciated. I probably should have been a bit more precise in my statement though. I am aware of the advantages of ECC RAM in general, ECC memory is after all essential for my daily work. I see those error messages occasionally on our local cluster (which has of course ECC RAM). That would be a real pain in the neck, if I had to debug random crashes of my code when running on a 1000 cores and one of them experiences a memory error.

My statement was more targeted at my home NAS use case where I am dealing with a small SBC with 2 GB RAM and 2 TB disk space, which is quite different compared to that cluster I mentioned above with 2000 GB RAM and 500 TB disk space. Business continuity and availability are not that crucial in my home setup - it's all about the probabilities of a failure. But then again, I like my data as it's mostly photos and memories, and it would be sad to loose them due to some memory error.

Zitat von tkaiser

Disclaimer: I personally try to keep storage pools as small as possible (maybe because I deal with storage for a living and am constantly confronted with the downsides of really large storage setups like an fsck or RAID rebuild taking days or even weeks and such stuff).

With btrfs and 32-bit systems like the Helios4 there seems to be a 8TiB limitation and in general ARM devices with their low amounts of DRAM can run into trouble when dealing with (checking/repairing) huge storage setups.

At the moment I don't need more than 2TB (and that is already generous), so two disks where one of them is to enable the 'self healing' capabilities of the file system, was what I had in mind. The Helios4 sounds indeed like a good option in that regard - I am only hesitating because my previous experience with btrfs wasn't really the best (as mentioned earlier already) and ZFS on that board starts to look less like an option (I haven't totally given up on that thought though).

dkxls · 21. März 2019

Zitat von Adoby

Well, it seems that currently, restic does not provide any bitrot detection or protection at all? Apart from that it looks like a really nice backup system.

Agreed! Restic is a very nice backup system, but has currently no bitrot protection.

tkaiser · 21. März 2019

Zitat von dkxls

btrfs. I have used it around 6 years ago ... OpenSUSE ... the snapshots btrfs automatically did and eventually ran out of disk space in /var

Btrfs doesn't do snapshots on its own. I would believe you're talking about OpenSuse's snapper instead?

Anyway: there's no need to create constant snapshots (if you do snapshots then choose a tool that automates everything for you and deletes unneeded snapshots based on a retention policy -- I already mentioned btrbk above). And experiences from 6 years ago are pretty much worthless (the work on btrfs started a decade ago!). Same with all the 'reports' on the net blaming btrfs for almost everything that could go wrong with storage (people are fine with ext4 and silent data corruption but if they use a great filesystem able to spot data corruption they get mad and blame the filesystem instead of crappy hardware or a failed mdraid below or whatever).

dkxls · 22. März 2019

Zitat von tkaiser

Btrfs doesn't do snapshots on its own. I would believe you're talking about OpenSuse's snapper instead?
...

And experiences from 6 years ago are pretty much worthless (the work on btrfs started a decade ago!).

I agree, my reservations for Btrfs are mostly based an FUD. Not the best way to make such decisions, but I can’t dismiss them straightaway either. This is part of the reason why I joined the conversation here and was hoping to get some more insight/arguments for or against certain filesystems and on which hardware they can be used.

Making a case for ZFS is pretty straight forward due to the overwhelming amount of articles that appraise it as the best filesystem around (see e.g. this or this article).

And yes, you are correct, the issues I had were related to OpenSuse's snapper utility, if I remember correctly, I wasn't really running out of disk space, it was more related to the meta data and the system thus reporting that I was running out of disk space and thus cannot mount the partition. In the meantime, I switched my main production systems from openSUSE to CentOS/Fedora and back to old-school file systems. I started reading up on the btrbk utility, which seems to be a rather good approach towards that problem.

tkaiser · 22. März 2019

Zitat von dkxls

I agree, my reservations for Btrfs are mostly based an FUD. Not the best way to make such decisions, but I can’t dismiss them straightaway either

We're all affected by this and I would guess the older the more

If things go wrong it's kind of a normal reaction to blame stuff that is new to you or unknown or not fully understood yet. Then instead of a failing/weak PSU dropping disks out of ZFS pools OMV's proxmox kernel will be blamed, people take reports of btrfs filesystems failing on ReadyNAS boxes (just the symptom of data corruption at the mdraid/lvm layer below) as a proof that btrfs' own raid implementation is not ready (while it's not even used here in reality), the same OMV users happily living with mdraid's write hole putting their data at risk with a degraded RAID5/6 whine they can not use btrfs' raid56 mode since 'write hole' and so on...

And what's also a real problem wrt awareness is users loving to shoot the messenger. In situations with faulty hardware traditional filesystems often do not report any error and you get that's something wrong only by accident: if unreliable hardware not just silently corrupted data but destroyed filesystem structures or them realizing that a lot of data is corrupted/lost only when it's way too late and also the corruption already spread into all backups (if existent).

With those modern 'checksumming' filesystems like btrfs and ZFS you're notified almost immediately that something's wrong. But users then blame the filesystem instead of realizing that there's something wrong with their storage hardware below. Users seem to be happy suffering from silent data corruption but hate it to realize that their hardware sucks.

ZFS and btrfs sit in exactly the same boat here but since ZFS is more used on server grade hardware (relying on hardware components that do not suck™) while btrfs is used also on commodity hardware reports of bizarre btrfs failures are quite normal and happen more often. In the end FreeNAS guys when inventing and spreading the myth that ZFS would need at least 8GB ECC DRAM did something good since users who feel bound to this requirement buy more likely hardware that does not suck™.

macom · 22. März 2019

Zitat von tkaiser

if you do snapshots then choose a tool that automates everything for you and deletes unneeded snapshots based on a retention policy

Just want to mention, that snapper is doing this. You can configure snapper to create snapshots on a time basis (every hour or what ever). Additionally it will create a snapshot before and after any apt something. Cleanup is also done based on configurable criteria. If you do not want to have time based snapshots you can disable.

Jetzt mitmachen!