Help me learn!

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Help me learn!

      So I'm new to RAID, NAS and Linux but proficient in general Windows-based computing. I'm hoping some nice people would point me to certain things to try so I can learn anything considered must-know or at least good-to-know.

      I've installed OMV 3.0.89 and have it running fine with a share that I've successfully mapped as a network drive on my Windows PC.
      Setup is a Poweredge R510 with a Perc H700 (I'll be switching this out for an IT flashed H200). OS is installed on a 64GB SSD and have a RAID0 (2x160GB HDD), RAID5 (4x1TB HDD), RAID6 (5x1TB HDD) and a Hot Spare (1x1TB HDD).

      Current hardware and config is all temporary for testing/learning. Once I have a backup solution I'll start on the real NAS (64GB SSD for OS and 8x10TB HDD in RAID6).

      I'm not expecting guides or deep posts, something like "Try X" or "Play around with x" will probably do...possibly maybe?
    • I'm aware using large drives is generally frowned upon due to build times and the high chance of another drive failing during a rebuild etc, but put simply I need a single large storage solution instead of constantly buying external hard drives.

      Not to nit-pick but it'll be 60TB in RAID6. If my needs go high enough it'll likely end up 10 drives in RAID6 with an extra 2 HS, but that's not in the foreseeable future. Appreciate your feedback though.
    • tkaiser wrote:

      Where/how do you backup this data?
      At the moment the main contender is a PowerVault TL4000 (they pop up every now and then) but I don't have a solid idea yet. For obvious reasons I won't be building a duplicate server for a mirror backup.
      Data will be going on my server but not leaving so I imagine the initial backup would be painful but after that it should be fine.

      Unless you mean my current backup, I unfortunately don't have one which is why I pushed myself to get something like this started.
    • 64Gb SSD is too much, you will have a lot of wasted space. You could cut it to 16Gb and be fine. Just look for the most reliable SSD's (I wasted an OCZ 32Gb SSD for OMV, only about 8Gb was used after I installed and ran Plex Media server (the Plex database takes up a fair chunk as I found out with an 8Gb SSD) , but it ran 24/7 for about 5 years and is still good now).. I stopped using RAID as it didn't suit my needs and just backup over multiple drives which came in real handy when 3 x 3Tb drives all failed at the same time, again after 5 years continual usage..
      HP N54L Microserver, 20Gb Intel SSD, 4Gb RAM runing OMV 3.X
      HP N54L Microserver 20Gb Intel SSD, 8Gb RAM running OMV 3.X
      and loads of other PC's and NAS... OMV by far the best....
      (P.S. I hate Windows 10, 8.1, 8, 7 Vista, XP, 2K, ME, 98se, 98 and 95 - I have lost hours of my life to this windows virus)
    • MrT10001 wrote:

      64Gb SSD is too much, you will have a lot of wasted space.
      I completely agree, but this is an SSD that's been in my drawer for the past 4 years so it's still a better use than an object taking up drawer space. I do agree about reliability though. This is a Vertex 4 by OCZ, who went out of business, so it'd be wise of me to invest in a smaller and more reliable drive. If I'm smart I'll grab 2 so I have an OS backup too.
    • Fliggle_Slaps wrote:

      For obvious reasons I won't be building a duplicate server for a mirror backup.
      And this is the first I would do. Of course not a stupid 1:1 mirror box but with some added functionality.

      Fliggle_Slaps wrote:

      I imagine the initial backup would be painful but after that it should be fine.
      No, backup times are close to irrelevant but restore times and especially the time to be up and running again after a complete data loss (desaster recovery) are important.

      Your 60TB are an awful lot of data, RAID resync times will be horrible (especially with stupid RAID modes that also resync empty data) as well as restore and desaster recovery times trying to get the data off the tapes again.

      When amount of data grew more and more we added at every customer a sync solution already over half a decade ago. LTO tape libraries are still used almost everywhere but only for archive purposes any more. Just do the math: 60 TB at a theoretical restore rate of 140 MB/s (assuming no compression applied) results in 428,500 seconds without taking tape changes into account. That's already 5 full days, add to this the incident happening on a saturday evening, you need to buy new hardware, things are not working as expected (eg. the LTO library failing) then we're talking realistically about 1.5 to 2 weeks.

      So better invest in a second box (120% storage capacity of the real filer), use the following search terms and forget about concepts from last century (RAID6 and tape changers):
      • ZFS, snapshot, scrub
      • RAIDZ2
      • zfs send/receive
      • znapzend (it does all the complex magic for you... fully automated)
    • tkaiser wrote:

      So better invest in a second box (120% storage capacity of the real filer), use the following search terms and forget about concepts from last century (RAID6 and tape changers
      I absolutely preferred ZFS and RAIDZ but the RAM requirements are quite high. More importantly, the simple reason I was looking to tape storage is purely cost. My first idea was to have a second server for backup but it's just not feasible in my budget. If spending ~AU$800 on 128GB ECCRAM + a second CPU to enable the B slots is highly beneficial then I perhaps I could aim for that but due to funds I may never reach that AND have a good backup solution.

      Even though you scared me with the recovery time of 60TB via tape, it still feels like my best option.
    • Fliggle_Slaps wrote:

      I absolutely preferred ZFS and RAIDZ but the RAM requirements are quite high.
      If you know what you're doing and are able to spot 'BS copy&pasted over and over again' then NOT. Applies to both the '1GB of DRAM per 1TB of storage' myth and the way more stupid 'ZFS scrub of death' myth trying to tell you you could use ZFS only with ECC DRAM.

      What you describe is cold data with most probably a single client (you). ZFS allocates as much DRAM as possible for a single reason: to cache as much as possible to be able to serve the few hundred clients accessing your enterprise grade server. If you tell ZFS that there aren't hundreds of clients and it's not about high performance enterprise storage but 'just a bunch of bytes' you'll be pretty fine with 32GB DRAM for your 60TB as long as you're not using dedup and tell ZFS to behave a bit better (lowering ARC cache since with your usage pattern all caching is useless anyway).

      With ECC DRAM it's easy. If you love your data you invest in ECC DRAM, if you don't care that much about bit rotting and data integrity you choose a setup that might corrupt your data or even let your machine crash when bit flips happen at the wrong memory location. But that's not related to ZFS.

      BTW: Memory requirements for ZFS even decrease with more recent ZoL versions. Check the performance section here: github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.0
    • I really appreciate all the info you're giving tkaiser. It's too late in the night to do research but I'll be going over your links tomorrow because I'd love to go for ZFS.

      I wouldn't bother filling my server with standard RAM. Though cost is an issue, the difference between ECC and standard isn't big enough to make me cut costs there. Buying a whole separate server AND 8 or more 10TB HDDs is a different story. I'm a simple man with grand dreams...but a thin wallet.
    • If you're going to be buying 10TB drives for home media server, then not buying more RAM is skimping.

      60TB of storage is not "home user" level. Do you seriously need 60TB ? If this is media for your personal, then with SnapRAID and mergerfs you get much a more flexible and extensible system. Start small and expand as your needs increase.
    • Nibb31 wrote:

      not buying more RAM is skimping
      I my original comment was based on my misunderstanding about the 1GB of RAM per 1TB of storage, meaning I'd need 128GB of RAM, which including a second CPU is very expensive. After I look into tkaiser's links I'll likely be upgrading to at least 32GB of RAM. Prefer not to get a second CPU just for more RAM.


      tkaiser wrote:

      You plan to buy such a thing used with at least one working/reliable drive inside and 48 LTO5 tapes for how much?
      TL4000 seems to be around the same as a second R510, but 20 LTO5 tapes (60TB) is a bit cheaper than a single 10TB HDD. To clear up my previous comments, any backup solution is expensive but there is still a big difference in cost. That means a second server + 8 HDDs would be more than twice the cost of a tape solution.
      I think my best solution is to keep relying on my pile of external drives, get a second server going and use 4 of the 10TB drives in each server (one as a backup of the other). I was locked into thinking about migrating the data to the server and stashing the externals for good, that I didn't consider using both.

      Question: How important would you say parity drives are in the backup server? My view is that parity is VERY important in the live server but the backup is less important. I guess it comes down to how scared you are that both live and backup could fail at the same time.
    • Fliggle_Slaps wrote:

      20 LTO5 tapes (60TB)
      I would really do a lot more research first. LTO5 means 1.5TB per tape real storage capacity. When transparent compression was added to tape technology ages ago some marketing dude thought 'Let's tell the clueless customers that we're able to store 200% of data on our tapes' assuming a 2:1 compression ratio for exactly no reason. Since the first marketing dude started with this bold lie all others followed of course.

      If your data is already compressed (eg movie formats) you won't see any further compression effects and only 1.5TB will fit on a tape. If your data is highly compressible (eg. plain text like log files) then even 15 TB fit on a tape. But the reality is that you should think about 1.5TB per tape if your data is made of huge files and maybe 1.8TB if you're dealing only with small files (since due to backup software streaming the contents you get less overhead compared to the representation in old(er) filesystems that all implement a minimum block/chunk size)
    • tkaiser wrote:

      LTO5 means 1.5TB per tape real storage capacity
      I was going to look into that before making any decisions. It was clear the fact there were 2 numbers meant each applied to specific scenarios. In that case the TL4000 would not handle my server's full growth down the line, even though that is far down the track. There's really no point in this option. I'll need to put more thought into continued use of externals while growing a server + backup server.

      Currently reading up on all of suggestions in this thread so it'll be some time before I make any educated progression. Will likely give a few different things a test to see what feels best.
    • Fliggle_Slaps wrote:

      In that case the TL4000 would not handle my server's full growth down the line, even though that is far down the track
      One last time wrt the tape library backup idea. Imagine your whole server has gone (stolen, fire or just one of the many 'Oops my array has gone' events this forum is full off since people are neither willing to learn RAID basic principles nor want to spend efforts on CONTINOUS TESTING). In case you really need access to your backed up data this aspect of the problem -- access -- needs a lot of attention/thoughts.

      If you use 'classical' tape library backup attempts you do a full backup to tapes (streaming the data) and then save incrementally later. So a full restore of lets say 30 TB involves reading in 20 LTO5 tapes to get the initial full backup and then another few tapes for the incremental changes later (more modern backup softwares therefore allow to create 'synthetic full backups' from time to time). If you calculate with theoretical maximum speeds of 140 MB/s (raw LTO5 data rate) this is an awful lot of time especially since tape changes need also some time and you most probably will run into issues so overall 'restore performance' will be way lower. In other words: if you're fine having no access to your data for 1 or 2 weeks consider tape backup, otherwise better think about the alternatives.

      A full restore might also be challenging for another reason: since if your productive server vanished (stolen/fire) you need a new system with at least same storage capacity. If this is not possible you might want to restore only 'more important' data from tape for now. If your backup software has a horrible user interface (applies to almost all of them) it can get challenging now to get back this part of the data. So if you start to test this stuff ensure that you can cope with your software's user interface too: how easy it is to restore certain parts of the backed up filesystems restoring the latest state in a reasonable time (I've seen soooo many stupid variants to implement this already)

      Another funny fact: For tape backup softwares to be able to allow for individual restores they need so called catalog files (more or less a database keeping track of every single file backed up in which state on which tape). If these catalog files aren't available any more (eg. since server vanished: stolen, fire, ...) then good luck since now to be able to access any individual file first the catalog has to be restored which means essentially reading in the contents of all tapes you used for this backup set (now you're back at the '1 or even 2 weeks' without data scenario).

      Even in large corporations it's not an accepted fact that 'having all data backed up' (ass covering mode) is something totally different than 'having all backed up data easily and fast accessible in case it's needed' (responsible mode).

      Little anecdote: A former customer of mine outsourced IT in the most irresponsible way imaginable. New IT service provider implemented everything as an expensive SAN with triple redundancy (two local storage boxes mirrored and a third in another location behind a leased 2 Bit line). After I asked what happens when I do an 'rm -rf /' and they realized that then there's triple desaster (since RAID and mirroring is not related to data protection) they added 2 expensive EMC backup/archive boxes. Configured with 4xGbE interfaces they talked about 'less than 36 hours for full desaster recovery' (50 TB data and obviously assuming 400 MB/s restore rate).

      I had to ask multiple times for a test and when they delivered (small restore from the remote location to local server) we had a restore rate of ~35 MB/s (so not 36 hours for a full restore but +16 DAYS). That's what you get when you assume things would work as expected believing in 'performance by specs' and not test everything thoroughly (problem there were the remote EMC box not licensed to make use of the 4 NICs and multiple GbE trunks in between all wrongly configured).

      The post was edited 1 time, last by tkaiser ().

    • I hope people show their appreciation to you tkaiser, there aren't a lot of people willing to continually go into such depth. The real-life example was a good read too.

      You made a good point about a full backup to sit in cold storage, and then fill the TL4000 to handle the following incremental changes. That would allow for growth to the live server (if I ever reached max use of all 12 bays, it would be 12x10TB drives - 8 in use and 4 in parity).

      Anybody who has enough experience with even general computing should know not to expect the max theoretical speed. Comparable to software that isn't coded to make use of multiple threads or simply poor optimization, meaning the CPU is very much under-performing. Or for somebody with a 100mb/s internet connection who doesn't understand why they aren't downloading torrents at a good speed, or why the internet slows down during peak times. You always need to have a full understanding of not just specs but configuration and any factors in-between.