Going to use RAID 6. Do I have this right?

  • Hello everyone. So an important moment is almost here regarding my NAS, and that moment is adding storage drives. I've been tinkering with OMV for a few weeks and have to say I'm pretty impressed and I'm definitely going to be keeping OMV as my OS for this box. Now comes the kind of scary part, adding storage drives...


    I've concluded that I'm going to be using RAID 6. I was originally just going to jbod, then decided some parity would be nice so I said RAID 5, but I've read a few horror stories regarding large drives and read error rates and basically RAID 5 is obsolete. I chose 6 for the double parity and hopefully better odds at recovering the array should a HDD fail on me. Anyhow, is these steps basically what I'd do to get up and running once I've installed the drives? I'm going to start off with 5 8TB drives for now.


    1) Format the drives with the ext4 (should I opt for xfs instead?)


    2) Encrypt the drives with luks (will drives automatically unlock upon reboot?)


    3) Create the raid 6 array.


    One thing I've read mixed responses with is expanding. My NAS holds 8 bays, but am starting off with 5 HDD's. When the time comes for me to add the last three drives, will I be able to do so and expand the array without relative difficulty, or will I have to start from scratch and rebuild entire array? Bear in mind I'm pretty nix illiterate (trying to learn slowly though) and would much prefer a GUI to help me should and troubleshooting and/or maintenance issues arise.


    Also should a drive fail, how would I go about swapping it? Pop in new one, format, encrypt then rebuild? Thanks and sorry if a few of these questions seem well..ignorant :D

    Case: U-NAS NSC-810
    Motherboard: ASRock - C236 WSI Mini ITX
    CPU: Core i7-6700
    Memory: 32GB Crucial DDR4-2133

    • Offizieller Beitrag

    The basic steps are:


    1. Insert your drives. For RAID 6, there must be a minimum of 4.
    2. Under <Storage>, <Physical Disks>, <Wipe> each drive that's going into the array. (DON'T wipe your boot drive.) Using the "quick" wipe option is fine. (There's no need to use "secure" wipe. If you do, you'll be waiting awhile for completion.)
    3. Under <Storage>, <RAID Management>, click on <Create>. Select the level, "RAID6", and at least 4 drives.
    It's going to take awhile for the first "sync" to complete. (If you stay on this page, you'll see progress in "percent".)
    4. Under <Storage>, <File Systems>, click on <Create>. Pick the file system you want to use on the RAID array. (The array will have it's own device name such as, dev/md0. That's what you'll be using in the device name field, for the file system.)
    (Depending one what you do here, the array size, etc.,, formatting may take awhile.
    5. In the same location, <Storage>, <File Systems>, after the format is finished, click on <Mount>.


    In basic terms, that's about it. From there it's the creation of shares, configuring services, etc.

    • Offizieller Beitrag

    I forgot to mention:


    It's very easy to add a drive and expand a RAID array. Even if OMV didn't support the process in the GUI, it's a two command line proposition.


    The following is a working example of those lines - mdadm commands:


    mdadm --add /dev/md0 /dev/sdf
    mdadm --grow /dev/md0 --raid-devices=5


    The first line adds a disk. The second line, sets off a restriping operation which integrates the new disk and grows the array.
    - assumes sda is the boot drive.
    - sdb through sde makes up a 4 drive array, md0.
    - sdf is the new drive added.


    BTW: Doing this operation from the command line will not "break" the OMV GUI.
    ________________________________________________________


    If you use the first line only "mdadm --add", the new drive becomes a hot spare for the array.
    If a drive fails, the hot spare is automatically added to the array.

  • Now comes the kind of scary part, adding storage drives...

    It would be scary if this would be the last step you consider. After you set up everything it's time for testing:

    • does it work when you pull out one drive?
    • does it work when you pull out two drives?
    • What happens when you insert a spare now?
    • What happens when you put back one of the 'failed' drives now?
    • How does your RAID cope with data corruption? Gets it detected?
    • Do the regular checks work?

    (And so on. Nobody at home does this but simply trusts in 'it must work since I spent already so much money and efforts on it' which transforms the whole approach in an untested waste of energy and ressources :) )


  • Thanks, seems easy enough. Which file system would be best? From what I've been reading, ext4 seems to be the way to go. When should I encrypt the drives though? Once the raid is built?



    Easy enough, ty :)


    It would be scary if this would be the last step you consider. After you set up everything it's time for testing:

    • does it work when you pull out one drive?
    • does it work when you pull out two drives?
    • What happens when you insert a spare now?
    • What happens when you put back one of the 'failed' drives now?
    • How does your RAID cope with data corruption? Gets it detected?
    • Do the regular checks work?

    (And so on. Nobody at home does this but simply trusts in 'it must work since I spent already so much money and efforts on it' which transforms the whole approach in an untested waste of energy and ressources :) )

    That's true. I'll definitely test it by pulling out those two drives, I am very curious :)

    Case: U-NAS NSC-810
    Motherboard: ASRock - C236 WSI Mini ITX
    CPU: Core i7-6700
    Memory: 32GB Crucial DDR4-2133

    • Offizieller Beitrag

    It would be scary if this would be the last step you consider. After you set up everything it's time for testing:

    • does it work when you pull out one drive?
    • does it work when you pull out two drives?
    • What happens when you insert a spare now?
    • What happens when you put back one of the 'failed' drives now?
    • How does your RAID cope with data corruption? Gets it detected?
    • Do the regular checks work?

    (And so on. Nobody at home does this but simply trusts in 'it must work since I spent already so much money and efforts on it' which transforms the whole approach in an untested waste of energy and ressources :) )

    I'll do this by the numbers, from the above. (The assumption in the following is, you used RAID6.)


    1. Yes. RAID6 will continue to run with up to two drive failures.
    2. Yes. After two failures you're still good and all would operate as if nothing happened. Your array may run, literally, for months in this condition. (I've seen bonehead field admin's do exactly that, with 1 failed drive, in a RAID5 array.) However, there's no margin left. If there's a 3rd failure, all is lost and recovery is not possible or, at least, practical.
    At this point, with 2 failures, in <RAID Management> the state is "Clean, degraded". Data is available.
    3. First note it's better to have a hot spare on-line but sometimes there are not enough slots in the case to house that many drivers.
    - In any case, if you insert "a spare" at this point, either by command line or GUI (in the GUI you'd use the "recover button" to add a drive), it is built into the array. The state is "Clean, degraded, recovering" with progress provided in percent. (It will take awhile.) I'd advise adding two drives at once rather than doing them one at a time. Note, these operations are a serious stress test for older drives.
    4. If you're running a test and there's nothing wrong with the drive, shutdown, insert the drive, wipe it and use grow (if it's a spare) or recover if you're adding it back to a degraded array.
    5. RAID does not work with data corruption. It uses parity to protect from drive failures. That's it. RAID presents what looks like a "single disk" to the operating system. Lastly, RAID is not good at detecting it's own errors. Back in the day it had no mechanism for detecting and / or correcting the small number of errors that it will inevitably write to the array. (I don't know if software RAID is better.)
    - File hashing (calculating and storing check sums, etc.) is more of a file system function which is layered on top of RAID. In some cases, add-on's like Snapraid add similar protection. A journaling file system (ext3,4,xfs, etc.) prevents common instances of file corruption (but not all). A CoW or Copy on Write file system seems to be what you're looking for to prevent "silent" file corruption. This is a complex subject but a decent primer for it can be found here. CoW file systems
    6. Personally, I'm using BTRFS for a number of reasons. ZFS is great, but the learning curve is substantial and it requires gobs of ram. (1GB ram per 1TB of storage.) If you're going to run "tests" on BTRFS here's a place where you can see the Windows equivalents of chkdsk (with options). BTRFSck Frankly, I wouldn't do repairs without doing extensive research.



    I'm at home, I trust next to nothing, and I tend to test everything before I use it. (This includes OMV.) When M$ stopped supporting what I wanted to do at a reasonable cost (I refuse to pay $500 or more), I looked elsewhere and testing was what brought me to OMV.


    What I told you above, I tested in a OMV 3.0.77 virtualbox VM, with 7 virtual drives. I simulated drive failures by removing 2 drives before booting the VM. In any case, you can do the same. If you have a client with a decent CPU and a bit of extra memory, you can test OMV to your hearts content. If you want to know for sure, there's no better way.

    • Offizieller Beitrag

    Regarding encryption:
    I can't help you in this area but I'm tempted to do some VM testing to see how it works in the open source world.
    ______________


    While my experience in this is, by my own admission, way out of date:


    I've seen commercial whole drive encryption used at work, and I've actually recovered a lost security token (a software key). It was a freaking nightmare. It was those experiences that taught me not to encrypt my personal drives. Further, depending on how it's implemented, whole drive encryption might complicate fixing what would otherwise be simple file system issues.
    (I'm thinking about using a self booting rescue disk with a virus scanner, and other scenarios.)


    Encryption is only good for "physical" drive protection. (I.E. to prevent someone from physically stealing your drives and the data on them.) As far as I know, whole drive encryption doesn't provide much protection from network attacks where the OS has been compromised which is much more likely.


    My advice? If you're worried about physically losing your data, lock your server in a closet.

  • My advice? If you're worried about physically losing your data, lock your server in a closet.

    This what I did, but not in any ordinary closet. My OMV server is in my gun safe.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

    • Offizieller Beitrag

    This what I did, but not in any ordinary closet. My OMV server is in my gun safe.

    You know, I was thinking about mentioning just that. :thumbup:


    They make safe's these days with ethernet, power, etc. pass-through's but that seemed a bit beyond the scope of the thread.

  • I have seen those safes, and the maker also sells a kit for other safes that has pass-thrus for USB, ethernet, and power. The problem with that kit is that you have to cut a huge hole in the safe.


    http://www.cabelas.com/product…ixoC_Ubw_wcB&gclsrc=aw.ds


    All I needed was ethernet and power and I managed to get that thru an electrical 1/2in close nipple. The hole needed for that was only 1in dia, but it was still a lot of work to cut that hole as the safe walls are .250 in thick plate. I had to use a carbide tipped hole saw.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

    • Offizieller Beitrag

    You "drilled" it. Wow.


    I would have guessed it could be done with an oxy/acetylene torch but that would make a real mess of the inside, no matter which side was burnt.


    Gun safe's can be quite large but, I take you don't have problems with heat buildup?
    (If you're using an R-PI and a USB powered drive, they would run cool enough.)

  • Yep, drilled it. I had to stop many times after a few minutes because the drill body got too hot to hold.


    It's about 10 degrees warmer inside the safe compared to the outside. Doesn't seem to be a problem. The server is a low power Mini ITX Intel Avoton C2550 Quad-Core Processor. It currently has five 3.5in hard drives in it with room for three more. Current temps:


    HDDs: average 104F


    CPU: 110F

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

    • Offizieller Beitrag

    It's about 10 degrees warmer inside the safe compared to the outside. Doesn't seem to be a problem. The server is a low power Mini ITX Intel Avoton C2550 Quad-Core Processor. It currently has five 3.5in hard drives in it with room for three more. Current temps:


    HDDs: average 104F


    CPU: 110F

    Those temps are well within any reasonable spec. It seems you got just the right combo for heat load and good performance.
    Good job.



    Sometimes the best solutions are "home brewed". :thumbup:

  • Full specs in case you're interested.


    Board: ASRock Rack C2550D4I
    RAM: 16GB ECC
    HDDs: 4x3TB WD Red, 1x 4TB HGST
    SDD: (For OS only) 16GB Samsung 2.5in SATA, running OpenMediaVault 2.2.14
    Case: Silvestone DS380B

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • ZFS is great, but the learning curve is substantial and it requires gobs of ram. (1GB ram per 1TB of storage

    Nope. That was wrong already years ago and it doesn't get better by copy&paste over and over again. You need huge amounts of RAM for dedup but otherwise the less RAM you have the smaller your ARC cache gets.


    Wrt tests I was talking about real-world tests (nothing simulated in a VM with 'virtual disks', that's just a waste of time). It's disks that start to slowly misbehave that matter, it's not about 'black or white' scenarios like booting a system with two virtual disks disabled. I've seen so many RAIDs failing over the last 2 decades (and cleaned up afterwards trying to recover data from various sources since customers failed with backups -- same mistake: never controlled whether the stuff works as expected) that I simply don't trust in random RAID setups any more without extensice real world testing (eg. pulling the disk just partially out of the hot swap slot, then pushing it back again, then pulling it slightly out and watch the whole system failing)

    C2550D4I

    I hope it's not the B0 silicon?

  • Yep, it's the B0 stepping.


    It has a bit less than one year on it with it running 24/7. It had a three year warranty on it when I bought it new back in 2015, but the original board was replaced under warranty for a failed serial port.


    But that doesn't extend the warranty back to the original three years, so I have about one year left on it. I have a feeling the warranty will run out before it fails due to the known defect, and if so I'lll have to eat the cost of a replacement board at that time.


    Anybody know a way to accelerate that failure mode? LOL.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

    • Offizieller Beitrag

    Nope. That was wrong already years ago and it doesn't get better by copy&paste over and over again. You need huge amounts of RAM for dedup but otherwise the less RAM you have the smaller your ARC cache gets.

    Did you see the beginning of this thread? I get the distinct feeling that elastic is not following the latest developments in ZFS. I'll include myself in that assessment as well. And I still standby by my "NOOB" assessment regarding the ZFS learning curve and RAM considerations. Why? Because the various scenarios that ZFS can be applied to are vast. It's an enterprise solution. ZFS is "complex". Also, in the exploration process (remember, NOOB's here!) one might trigger a de-duplication process when mucking around on the CLI so I'd still argue that skimping on RAM, as a basic requirement, is not a good idea.


    Wrt tests I was talking about real-world tests (nothing simulated in a VM with 'virtual disks', that's just a waste of time). It's disks that start to slowly misbehave that matter, it's not about 'black or white' scenarios like booting a system with two virtual disks disabled.

    Again - NOOB alert!


    But when it comes to the basic behaviors of RAID, nothing I said in this thread is wrong. Where you're concerned, I get the sense that maybe you've run test scenarios in a lab environment. I can tell you that I have as well - where I attempted to structure scenarios and control as many variables as possible. Outside of a LAB? For conclusive tests of computer hardware, I tend to trust Toms Hardware . However, in a home environment, most do not have the luxury of extensive hardware laying on the bench to test the real thing, or even the time required for obtaining true empirical data. Alas, this leaves the NOOB (and many of OMV's developers by the way) with few testing options, outside of using VM's. So, if VM tests are "wasting time", I believe it's time well wasted.


    I completely understand that the real world has a way of producing odd, even bizarre, behaviors that basic tests can not simulate and/or reproduce. However, in most cases, those odd ball events are exceedingly rare. Again, I stand by the test scenario posted and my comments regarding the basic behaviors of RAID6 as being, "typical" or "nominal". I believe, that if elastic pulls drives he'll see the same basic behaviors.
    Lastly, in other threads I've stated in no uncertain terms that I'm not a fan of RAID because, as many mistakenly believe, it's simply NOT backup. Herewith -> Thoughts on RAID



    However, you're right to send up a flag regarding pulling drives. So...
    __________________________________________________________________



    I'll definitely test it by pulling out those two drives, I am very curious :) .


    tkaiser raises valid points regarding pulling hard drives out of an array. Even if you have "hot swap" rated hardware, you're taking an unnecessary risk by pulling out a drive "hot". Drives simply do not fail in that way, where the interface and power disconnects all at once.
    So, if you want to see RAID6's drive recovery capabilities, do it like I did it in the VM. Shutdown, disconnect a drive (or two) and power up. To add the drives back, shutdown, plug them back in, and add them back to the array after booting again. You might want to think about doing your testing before copying huge amounts of data onto the array. (Add a GB or two, maybe a few music files, for test purposes.) Otherwise, the process laid out about above, applies.
    tkaiser is also correct on the issue of not maintaining backup. RAID is not backup and may be giving you a false sense of security. If you truly want to keep your data, you need backup. In the link -> Thoughts on RAID , I state that I prefer full, platform independent, backup. It doesn't have to be expensive either. At the first level, I'm using an R-PI and a 4TB WD "My Passport" which is USB powered. They're a bit larger than the size of two packs of cards, use around 12 to 15 watts, and they'll sit, unnoticed, behind your server.


    Give having good, tested, backup some thought. (And note it's better to have backup, before the disaster.)


    Let us know how it goes.

  • Where you're concerned, I get the sense that maybe you've run test scenarios in a lab environment.

    No, never, why? I'm talking about 'real world'. I've seen RAIDs not coming back after a reboot since proprietary RAID controller complained about the 'hot spare' being not available for whatever reasons and thereby sabotaging the R in RAID (f*cking expensive and dog slow Megaraid controller from the time where these controllers were made by AMI). I've seen software RAIDs losing all their data without even outputting anything to dmesg, I've seen pretty nice Infortrend controllers with 24 bay RAIDs but omly double redundancy dropping in short time more than 2 disks that were on their HCL --> more than 2 disks dropped --> Array lost. Only solution: Firmware upgrade on all affected 18 drives from a specific production batch (lesson learned: never ever buy large bunches of disks for a RAID and stack smaller arrays on different layers). I've dealt with SAS Expanders with broken cabling that suppressed SMART readouts and we only discovered this since performance was low as hell at a time where approx 7 percent of data was already slightly corrupted.


    TL;DR: Test your RAID under real-world conditions (not simulating a perfect world in a VM with virtualized disks),use good filesystems that are able to detect data corruption (regular scrubs), do one or better two backups and TEST RESTORE REGULARLY. If you're not willing to do this simply forget about RAID since it's just a waste of resources (it's not necessarily bad as long as you really understand what it does -- nothing useful at home since it ONLY provides availability but NO data safety or data integrity. Almost all RAID modes are just absolutely useless if you don't need 'business continuity' :) )

    • Offizieller Beitrag

    nothing useful at home since it ONLY provides availability

    You must not have young children :D:D They don't tolerate restoring from backup or having to switch to the backup server. That said, I don't run raid on any of my servers at home anymore. I backup to another drive (btrfs with scheduled scrubs) in the same server nightly, another server nightly, offsite drive every few weeks, and LTO-6 tape every few months.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

    • Offizieller Beitrag

    I backup to another drive (btrfs with scheduled scrubs) in the same server nightly, another server nightly, offsite drive every few weeks, and LTO-6 tape every few months.

    Good Grief, ryecoaaron! I thought I was an*l, (well,, on the "conservative" side) with 3 levels of backup. You have 4!



    - I've seen RAIDs not coming back after a reboot since proprietary RAID controller complained about the 'hot spare' being not available for whatever reasons and thereby sabotaging the R in RAID (f*cking expensive and dog slow Megaraid controller from the time where these controllers were made by AMI).
    - I've seen software RAIDs losing all their data without even outputting anything to dmesg.
    - I've seen pretty nice Infortrend controllers with 24 bay RAIDs but omly double redundancy dropping in short time more than 2 disks that were on their HCL --> more than 2 disks dropped --> Array lost. Only solution: Firmware upgrade on all affected 18 drives from a specific production batch (lesson learned: never ever buy large bunches of disks for a RAID and stack smaller arrays on different layers).
    - I've dealt with SAS Expanders with broken cabling that suppressed SMART readouts and we only discovered this since performance was low as hell at a time where approx 7 percent of data was already slightly corrupted.
    TL;DR: Test your RAID under real-world conditions (not simulating a perfect world in a VM with virtualized disks),use good filesystems that are able to detect data corruption (regular scrubs), do one or better two backups and TEST RESTORE REGULARLY. If you're not willing to do this simply forget about RAID since it's just a waste of resources (it's not necessarily bad as long as you really understand what it does -- nothing useful at home since it ONLY provides availability but NO data safety or data integrity. Almost all RAID modes are just absolutely useless if you don't need 'business continuity' :) )

    Again, Good Grief, tkaiser!
    While they were far from rare, I've never seen that many RAID failure scenarios. I have to say, in retrospect, I now have a bit more respect for the server jocks, for exhaustively testing a couple of specific RAID controllers, RAID SCSI bays, and a couple drive brands and sizes. (All of it was very expensive, most of it was purchased in bulk with very specific OEM guarantees for reliability, AFTER said testing was complete.) Standardized hardware must have made a real difference. What you're describing is something of a "serialistic support nightmare".
    ____________________________________________________________________


    In any case, in the essentials, I agree with both of you. (Hence the piece against using RAID at home, preferring multiple levels of backup instead.)
    For a single server, a journaling file system is the absolute minimum, with a CoW file system preferred (BTRFS or ZFS). Since I'm a NOOB and I use it myself, I recommended the most NOOB friendly "Copy on Write" file system, BTRFS. From what I've read on it, I suspect that ZFS is better, but I just haven't had the time or (frankly) the inclination to get some practical hands on.
    ____________________________________________________________________


    Regardless, I was focusing on the subject of this thread which was:

    I've concluded that I'm going to be using RAID 6.

    While stating that RAID is not backup, I answered elastic's questions on RAID6 processes, as best I could.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!