Solved? OMV and software raid 5

    • flmaxey:
      Sorry...please read your quote...seemed to have added my info there....;)

      flmaxey wrote:

      Regarding: "OMV on 7 computers being used for storage": (Are these repurposed consumer PC's?)

      This is the crux of the matter. I imagine that these boxes are the source/destination of all your client clone copies. Based on an earlier post, I'm guessing that the clone copies are organized alphabetically..??

      Here are a few questions for you:

      1. What is the total of your storage requirement, presently?
      3. What is the data to be stored, primarily? (Factory default drive clones? Rough size of each?)
      4. How many storage drives are in each OMV server?
      5. What file system are you using?
      6. Memory in the OMV boxes?
      (While I don't know the parameters, or their reasons for suggesting it, ZFS pro's suggest 1GB RAM per TB of storage but that's in a file server / server farm application. (Which equates to a LOT of traffic with concurrent users.)
      Without "file deduplication", the ZFS on Linux project recommends a minimum of 2GB. (See Hardware) In your scenario, with OMV, I think you'll be fine with 4GB ram. OMV will work with 1GB (on Raspberry PI's and other SMB's) In your scenario, that would leave the remaining 3GB for ZFS.

      Finally, how critical do you see this data to be? Meaning, if you lost some or all of it, what would be the consequence? (I imagine there may be "classes" involved. ?TB is critical, where ?TB is important, but not critical.)
      ______________________________________________________________
      Hi again!
      Wow you really have a two penneth worth of info here.....


      After reading everything and what others have said I probably don't need a raid scenario on my Work Server. I just need a way to pool the drives so that what I save on them is saved alphabetically...ie I don't need to move them around when one letter has more images than another....


      BTW I do clone all my computers....and have backups of those clones....;)


      Because I am a one man band as it goes....;) I utilise everything I can in the way of drives and storage space...If a customer decides to go from a work station to a laptop and doesn't want to do anything with the old computer I then use it (if it is worth having ) as a storage vessel..


      I will add that if I reuse such a computer I check the motherboard for components that look like they are swelling....;) I redo the thermal paste which I usually replace on all computers after a couple of years... I even replace the processor cooling system if I have somthing available...


      I am good at what I do building computers and repairing them...but my real expertise area is problem solving....


      I am always honest with my customers and say that if I don't have the particular solution to their problem I will find it....(if at all possible...and most of the usual problems I already aware of how to solve them)


      No one person can be good at everything....


      I hate seeing customers with purpose computers for sale and have the nerve to call them games computers....just had a new guy on the block claiming he sells games computers and then proceeds to use a Corsair CX450 as a power source.....I even see compromises from known brands...Not condemning Corsair....but never use a budget PSU in a games computer....


      Those that know me know that a games computer is something worth building well and not something one should compromise on....if you have a small budget...then no games computer....;)
      Your questions:
      1. I have 8TB in a 4 x 2TB software raid 5 and that space is on its limit....I was hoping to do a big jump in size so that I can concentrate (albeit disk failures) on other things...
      I have just bought 5 x 4 TB WD Red (not all from the same place) for my work server....I recently upgraded my VMWare from 1TB to 2TB so that is OK at the moment...
      The computer that I back my clonezilla images on is a purpose built computer by me.... I have a tendency to move computers when building new ones..the computer I am sitting at at the moment is my company one and is my latest build with good components....my clonezilla image backup computer is an old company one....;)


      2. When I buy in a new computer...most often nowadays a laptop that my customer has chosen....I start it set it up for them and install their software, Microsoft Office...if they have one....setup a web mail client... antivirus protection etc and then create a purpose image of that computer...Now it depends on the computer..if it has a SSD or standard drive around 256 GB - then I do a complete image with recovery partitions (if available)...but on larger drives I just save parts so that I can restore their computers....the images vary in size up to about 37GB...so ZFS compression would save me space...I will add that my customers sign an agreement to the effect that I tell them what I have saved.... I also store syspreped wims etc on the clonezilla backup server...
      I always have up to date versions of an OS so that if a computer comes in with just so much bluff software in use (customers have downloaded by mistake) and I can see that there is several different proplems then I usually recommend a restore....I used to restore to factory defaults but more ofthen than not it is just a waste of time so I restore to their OS and activate their license....


      3. The amount of drives in each OMV server varies on what I have.... but most have at least 3 to 4 TB and are sometimes a backup of a backup.....The clonezilla server backup computer I am upgrading with extra 3TB drives....I have 3x3TB at the moment and have two more available...


      4. All my computers apart from my Windows ones are using ext4..


      5. The memory in the OMV computers is all standard..sorry some have memory with heatsinks...I never build even a standard computer with memory without heatsinks....All have at least 4GB but several are using DDR2 6400. I would not be using dedup....


      If I lose a customer clone it means I have to start from scratch...but like I said I usually have updated wims of all OS from XP to Windows 10...so the clones save me time....but not the end of the world crisis....


      At the moment I am using the web user interface for my OMV computers and have everything saved as favourites in Firefox, Chrome, IE....I also use puttey. Puttey is setup for all my computers and I can quickly access them via my company computer...

      flmaxey wrote:

      Really, as it seems from the thread, your considerations may to be more about data organization than anything else. Of course getting the right storage structure would help, and go a long way toward preventing a potential disaster.
      Data organisation is easier if you have drives pooled as one otherwise you have to allocate one drive for a - f another for g - k etc but the risk is you will always have more of one letter than another.....so pooling drives saves this problem....Another thing is maybe a - f has 20GB left but the image you want to save is 35GB....;)

      tkaiser has opened my eyes to the need to restructure and ext 4 doesn't give the advantages of ZFS....


      As you can imagine this undertaking is going to take time and I don't want to have to redo things for a while....I always have a new backup of my system drives on all computers if I have upgraded the OD....so I can reset in case of problems.....

      I am better at maintaining my hardware than software....files and need to improve on that.....BUT that doesn't mean I haven't got backups of all important files....

      bookie56
    • I'm out the door again, for a few days. I'll get back to you, on return.
      Until then:
      ________________________________________________________

      Repurposing older PC hardware - it's what I'm doing these days to get around buying new hardware on a regular basis. (I see nothing wrong with re-furb's. :D ) While I enjoy a PC game, from time to time, nearly any PC built in the last 10 years will run the older games. (C&C and others.)

      On a side note regarding "gaming computers";
      I'm stunned at both the realism in today's games and the beef in the hardware required to run them. GPU's are rivaling CPU's for processing power... It's simply amazing, and so are the costs. ($400+ GPU's .......)

      If I were you, I'd do the same thing for building file servers. Hardware laying around? Install OMV and use it. Outside of a data center (where ECC is a real requirement), consumer PC's make fine file servers.

      On the fix and repair thing - I've found in both commercial and consumer electronics, the majority of failures are power supplies or, otherwise, are power related. (I'm sure you're more than aware of that.) Beyond power issues, in PC's where hardware/software interaction can cause problems, things can get real interesting.
      (While I should drop weird issues and move on, I've spent many hours puzzling out "what happened".)
      ______________________________________________________

      Some thoughts on using any pooling technique:
      The convenience associated with using pooling techniques can be far offset by the agony of trying to recover from a failure. If trying to recover a single disk, in most non-CoW disk formats, there are plenty of utilities out there. When trying to recover a disk that was part of a pool, well, you "might" find something or you might not. In this respect mergerFS shines, because pooled disks are independent with their file systems intact, below the pool. In essence, you can bust a mergerFS pool and everything is still there. (But it might be a job to sort it all out, in finding where mergerFS stashed folders and files.)
      So, is it better to reorganize your data, or become dependent on a pooling technique, or create some combination of the two? I believe that's the question.
      ((With the above noted, with solid regularly scheduled backup, many potential recovery nightmares simply fade away. With solid backup, you can pool to your hearts content.))

      As noted in the last post, I believe you're in a use case that fits a (ZFS) RAID scenario. From there, it would be a matter of the size of vdev's and disk sizes you're comfortable with. Personally, until they've been vetted over more time, I wouldn't use 8TB drives, period. I see 6TB as the absolute upper limit with 4 to 2TB being preferred. Why? As drives get larger, failure is more likely. Trusting a huge drive to that much data becomes a risk / benefit trade off that I don't think is worth taking. (But that's simply my opinion.)

      On the organization issue:
      Sorting images alphabetically is one way to organize. Have you given thought to organizing images by year, with monthly sub-dirs? (You could search on customer names, if the set the customer name as part of the file image name.) Such an approach would also give you an indication of when you could purge old data. That scheme could be further divided by image type (Win8, Win7, Vista, etc.) In any case, if you can come up with a better way to divide up the data store, that's logical and intuitive to you, the benefits are obvious. Just some thoughts.

      I'll look closer at your answers and numbers when I get back.
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.88 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119
    • New

      bookie56 wrote:

      1. I have 8TB in a 4 x 2TB software raid 5 and that space is on its limit....I was hoping to do a big jump in size so that I can concentrate (albeit disk failures) on other things...I have just bought 5 x 4 TB WD Red (not all from the same place) for my work server....I recently upgraded my VMWare from 1TB to 2TB so that is OK at the moment...
      The computer that I back my clonezilla images on is a purpose built computer by me.... I have a tendency to move computers when building new ones..the computer I am sitting at at the moment is my company one and is my latest build with good components....my clonezilla image backup computer is an old company one....;)
      2. When I buy in a new computer...most often nowadays a laptop that my customer has chosen....I start it set it up for them and install their software, Microsoft Office...if they have one....setup a web mail client... antivirus protection etc and then create a purpose image of that computer...Now it depends on the computer..if it has a SSD or standard drive around 256 GB - then I do a complete image with recovery partitions (if available)...but on larger drives I just save parts so that I can restore their computers....the images vary in size up to about 37GB...so ZFS compression would save me space...I will add that my customers sign an agreement to the effect that I tell them what I have saved.... I also store syspreped wims etc on the clonezilla backup server...
      I always have up to date versions of an OS so that if a computer comes in with just so much bluff software in use (customers have downloaded by mistake) and I can see that there is several different proplems then I usually recommend a restore....I used to restore to factory defaults but more ofthen than not it is just a waste of time so I restore to their OS and activate their license....
      3. The amount of drives in each OMV server varies on what I have.... but most have at least 3 to 4 TB and are sometimes a backup of a backup.....The clonezilla server backup computer I am upgrading with extra 3TB drives....I have 3x3TB at the moment and have two more available...
      4. All my computers apart from my Windows ones are using ext4..
      5. The memory in the OMV computers is all standard..sorry some have memory with heatsinks...I never build even a standard computer with memory without heatsinks....All have at least 4GB but several are using DDR2 6400. I would not be using dedup....
      On a side note:
      Resetting to factory defaults, in the Windows world, is a "PITA". Generally it means hours of re-updateing the OS, maybe adding a svc pack back in, removing all the badger-ware, other software "offers", and similar junk. Then some sort of decent firewall and virus scanner must be loaded. And all of that is necessary just to get a PC into a usable state where app's can be loaded. Still more time is involved in that. Again, a royal PITA.
      So, what you're doing in cloning prebuilt app populated images, for customers, makes sense. I'm sure they appreciate it and it's good for business.

      _______________________________________________________________

      On the storage requirement:
      After looking at your total array size (4x2TB = 8TB of disks = a 6TB array), so you must have something between 5 and 6TB of data.

      BTW: I see 75% available space filled, on an individual drive, drive pools, or array's, as "full". Any number of things can happen to where the remaining 25% can be filled quickly and cause trouble. ZFS, or any "copy on write" file system, does exactly that - copies on a write - so a reasonable chunk of free space is required. To my way of thinking 25% free space defines "reasonable".
      I provision for storage starting at 25% fill, and start looking at expansion when the fill percentage exceeds 55 to 60%. Again, the 25 - 75% start and end points are just my opinion.

      And I agree with your point that you may not need ZFS on your server. It's just a matter of shuffling or allocating your images in a way that makes sense to you, uses your available drive space well and, the most important part, backing it up. But for the sake of discussion....

      Based on a 4x4TB drive array, what I consider to be the safe limits for RAIDZ1 (or RAID5 equivalent), you'd have a 12TB array. Even if you stretched the array to 5x4TB disks (I believe that's risky), it would be 16TB. 6TB disks in the array might get you into a size that you might like but in terms of a single drive failure, that can cascade into an array failure, the risk really starts to climb. (So does the cost.) Regardless, ZFS will allow you pool vdev's and achieve truly enormous pools; however, on any one server, I (personally) would be reluctant to put that many "data eggs" in one basket. It's a matter of risk trade off's and what you're comfortable with.
      _____________________________

      In practical terms:
      Since you have hardware available, I'd seriously consider dividing up your data store between two servers. How? If the store is divided based on image file dates, the system you have in place (alphabetical by customer name) could remain unchanged. I'd consider creating a second server as an "archive server" running OMV. Again, everything (directory names, structures, etc.) would be a duplicate of your current data store, but anything over 2 years old (maybe 3 years old?) would be moved to the "IMAGE-ARCHIVE" server. With customer archived image folders shared to the network, in the event that an old customer needs a full restore, pulling the image onto the working server wouldn't take too long. (I'd make sure the Ethernet path between the servers is 1GB, minimum. 100MBS would be too slow.)

      [Side note - finding cloned images with a specific date time stamp can be done in the CLI with the "find" command. An easier way might be to use the find function in WinSCP and specify a before or after date. As an example, the mask * <2014-01-01 generates a list of files and folders stamped older than Jan 1rst, 2014.]

      On the organization end of it; symlinks could help with what you already have. You can, literally, drop in a symlink (a redirect to a folder on another drive or the root of the drive itself), and fill the second drive to near 100% capacity with the contents appearing to be in the first drive.

      drive 1 dir
      |------->(Client_Images)
      ....................|--------------> A - D (local dir on drive 1)
      ....................|--------------> E - G (symlink to drive 2) All files and sub-dir's of the folder E - G actually reside on drive 2.)

      (In the above, the Link/shortcut would be E - G and it would point to, for example, /srv/dev-disk-by-label-2TB (or the Debian drive name/path equivalent if the server is not OMV.)

      The only limitation to symlinks, for allocating data, is the imagination. The down side is, without coming up with a symlink scheme that's easy for you to remember and understand (or, otherwise, document), several symlinks can become unwieldy. Lastly, you'd need to keep an eye on each drive's fill percentage, or automate a report that notifies you of the fill percentage from time to time.
      OMV has a feature for setting up E-mail notifications, with the results generated from a command line. That's useful for a LOT of things. BTW: WinSCP will set symlinks on most Linux boxes, to include your main server.

      _______________________________________________

      So, in the bottom line, it seems as if you're warehousing a good sized chunk of data. Further, you have PC's (many of them consumer PC's) that will function as file servers but will not accommodate a lot of drives.
      (Your work server(s) may be exceptions.)

      I'm going on the assumption that your storage scenario is constrained by the following:
      - reasonable sized drives / drive costs
      - a reasonable number of disks in an array. (Also constrained by the number of drives a consumer PC can house.)

      Subject to my own biases, notions of data storage, safety, etc.:
      I'd recommend that you consider creating a second server, strictly for the storage of archived images. (Preferably those images that are old enough to where they're not likely to be used again.) This server could be one of your OMV machines, that will accommodate 3 or 4 drives. Other than shifting images to your archive server from time to time, and the occasional backup, it could even be powered off until you need it. If the box is off most of the time, but exercised once very few months, drive life can be quite long. (Note, don't use an SSD in a server that's powered off for extended periods,)

      While ZFS RAID is great for data preservation and self healing, if you have solid backup; whether you use ZFS RAID on one or both of these servers, for compression, would be your call. As you take ZFS into consideration, remember, the cost of ZFS compression is at least one parity drive. Further, outside of intentional file duplication (copies=2) there's no point in using ZFS as the file system for a single drive. For a single drive file system, there's nothing wrong with EXT4. If you wanted file check sum scrubbing on a single drive, which would make bit errors noticeable, I'd use BTRFS.

      So what do you think?

      MergerFS and/or symlinks are still not out of the question. They can give you what you want, "pooling" with little to no downside. Give it some thought, as I'm out the door tomorrow. I'll be back next weekend.
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.88 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119

      The post was edited 1 time, last by flmaxey: edit ().

    • New

      A last note, since you have VMWare running:

      While I mentioned this before, you could build an OMV server in a virtual machine to do test operations. I have a Virtual OMV build that I've used for testing ZFS / mdadm RAID / mergerFS, and for limited RAID failures scenarios, that has 7 each, 5GB virtual disks. (While small, lot's of disks can prove a number of RAID and pooling concepts.)

      Similarly you could set up a 4x12GB disk ZFS RAIDZ1 array, resulting in 36GB of usable space, to test ZFS compression. Then copy a single 30GB drive image onto it and see what you get. If ZFS compression doesn't give back the cost of the parity drive, 12GB or more, using ZFS solely for compression would not be very compelling .
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.88 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119
    • New

      flmaxey wrote:

      On a side note:Resetting to factory defaults, in the Windows world, is a "PITA". Generally it means hours of re-updateing the OS, maybe adding a svc pack back in, removing all the badger-ware, other software "offers", and similar junk. Then some sort of decent firewall and virus scanner must be loaded. And all of that is necessary just to get a PC into a usable state where app's can be loaded. Still more time is involved in that. Again, a royal PITA.
      So, what you're doing in cloning prebuilt app populated images, for customers, makes sense. I'm sure they appreciate it and it's good for business.

      _______________________________________________________________
      Updates are always a pain even on Clonezilla images if they haven't been needed for a while...BUT customers are hopeless at keeping their software available...most of my customers can't remember what they have done with their version of Microsoft Office etc....so I like saving the pain of fixing those problems...

      I like your idea of archiving older images to another computer...I can fix that....

      I will test the compression and get back to you on that one....



      flmaxey wrote:

      Based on a 4x4TB drive array, what I consider to be the safe limits for RAIDZ1 (or RAID5 equivalent), you'd have a 12TB array. Even if you stretched the array to 5x4TB disks (I believe that's risky), it would be 16TB. 6TB disks in the array might get you into a size that you might like but in terms of a single drive failure, that can cascade into an array failure, the risk really starts to climb. (So does the cost.) Regardless, ZFS will allow you pool vdev's and achieve truly enormous pools; however, on any one server, I (personally) would be reluctant to put that many "data eggs" in one basket. It's a matter of risk trade off's and what you're comfortable with.

      _____________________________
      I bought first 4x4TB drives and then because I wasn't sure what I would be using...bought a fifth one....but if 4x4 TB is the way to go...then I have a spare and I am sure I can find a use for that....lol



      flmaxey wrote:

      The only limitation to symlinks, for allocating data, is the imagination. The down side is, without coming up with a symlink scheme that's easy for you to remember and understand (or, otherwise, document), several symlinks can become unwieldy. Lastly, you'd need to keep an eye on each drive's fill percentage, or automate a report that notifies you of the fill percentage from time to time.

      OMV has a feature for setting up E-mail notifications, with the results generated from a command line. That's useful for a LOT of things. BTW: WinSCP will set symlinks on most Linux boxes, to include your main server.

      _______________________________________________
      As I said before I have been using Putty in Windows for access but have used WinSCP before and will refresh my memory on that...

      I don't mind the idea of symlinks and will follow your examples..



      flmaxey wrote:

      While ZFS RAID is great for data preservation and self healing, if you have solid backup; whether you use ZFS RAID on one or both of these servers, for compression, would be your call. As you take ZFS into consideration, remember, the cost of ZFS compression is at least one parity drive. Further, outside of intentional file duplication (copies=2) there's no point in using ZFS as the file system for a single drive. For a single drive file system, there's nothing wrong with EXT4. If you wanted file check sum scrubbing on a single drive, which would make bit errors noticeable, I'd use BTRFS.


      So what do you think?

      MergerFS and/or symlinks are still not out of the question. They can give you what you want, "pooling" with little to no downside. Give it some thought, as I'm out the door tomorrow. I'll be back next weekend.
      Yes, I hear what you are saying but it is the data integrity checksum in ZFS that tkaiser pointed out as a big plus for storing files....but if you know a better way with ext 4 I am listening....and I have read that is still early days for BTRFS...BUT if you think there is a scenario with BTRFS that will work for me....I am listening....

      If I understand what tkaiser said regarding ZFS data integrity checksum....it does this as you save the original files and when you do a back up that would show up any discrepancies between the two versions of the files...that is of course a big plus....Of course the checksum created is only as good as the original files data....;)


      bookie56