Tuning of ext4 on LVM and RAID.

  • Okay I researched the whole stuff and came to the following conclusion:


    1. The Raidsetup with the mdadm tools used is automatically okay and the raid data area starts at 2MB. I only do not like the default chunk size of 512k. If you want to create it on your own, use the following commands:


    (assume u have a 4 disk raid5 (/dev/sdb to /dev/sde and you want to name it "storage".)

    Code
    mdadm --create /dev/md0 -e 1.2 -n 4 -c 128 -l 5 -N storage /dev/sdb /dev/sdc /dev/sdd missing


    This creates a degraded array with 128k chunk size, which should be a good setting between small files and large files. It creates 90+MB/s throughput end to end. If you use 512k it will still work, but maybe slower on smaller files.


    After you created the degraded array, you need then to add the last disk to create the protected array:

    Code
    mdadm /dev/md0 --add /dev/sde


    At this point, the raid array will resync the parity information. You can monitor the status or the reconstruct with cat /proc/mdstat.


    /Edit: Find another post about tuning the RAID stripe_cache_size http://forums.openmediavault.org/viewtopic.php?f=11&t=1417


    Then create LVM physical disks (which is also using a 1MB offset for the data area) on the mirror device from the WebGui. You need to install the LVM2 plugin first.
    Create one volume containing the whole space.


    Afterwards create an ext4 file system also consuming the whole space (you can also choos to use less, whatever is applicable and appropriate to your situation). For each ext4 we now need to inform ext4 about the underlying raid architecture, so that it can optimize the writes. This is an imporant tuning step.


    We will tune two options:

    • stride
    • stripe-width


    Stride tells how many ext4 blocks (a 4096 byte) will fit into one chunk. So it is chunksizeKB/4=stride. In our example it is 128/4=32
    The stripe-width tells ext4 how many strides will fit into the full raid array. That means how many blocks ext4 needs to write to write one chunk on every physical and active disk. So in a raid5 array, we need to multiply the Stride value by the number of active disks. The number of active disks is the number of disks in raid - 1. So it is 3 in our example here. The stripe-width then is 32*3=96.


    The following command will set the parameters to the filesystem:


    Code
    tune2fs -E stride=32,stripe-width=96 -O dir_index /dev/mapper/UUIDofext4fspartition


    If you use the default 512k block size, then the following command line will tune your filesystem correctly:


    Code
    tune2fs -E stride=128,stripe-width=384 -O dir_index /dev/mapper/UUIDofext4fspartition


    Okay and now lets tune the mount options of your FS.


    Open /etc/fstab with whatever editor you want to use (nano) and add to the mount options the following:


    data=writeback,noatime, nouser_xattr


    This options should be used for home users. It will avoid journaling the data (only meta data journaling), avoid writing metadata on every read of a file (noatim) and avoid extended attributes. Most likely you will never use the later one. If you want to have a absolute rock solid data integrety, then you should not enable the data=writeback stuff. If you using it at home as your home NAS, then the maximum that can happen is, that the last files written can be corrupted in case of a power failure. The filesystem is still intact, but the data itself may be corrupted. So that is normally not an issue for home users, as the data directly written during a power failure are simply recoverable from other sources.


    After all that, do a final reboot and your performance should be good :)

    Everything is possible, sometimes it requires Google to find out how.

  • Hi SerErris!
    The hints for the stride and stripe-width options are very interesting.
    Don't you thank that it should be very useful if OMV were automatically tuning the ext4 filesystem when created on top of a Raid5 or Raid6 array (and when the RAID array is grown)?

  • Hello there!


    First off, thank you for taking the time to write up all of this. I'm new to OMV and I have a few questions:


    1. What do all the different options after "mdadm --create /dev/md0" mean? For example, what does "-e 1.2" mean? What does "-l 5" do?
    I'm guessing "-n 4" is the number drives in the raid and -c 128k is obviously the chunk size.


    2. If I create the raid with the OMV webgui, can I still use the tweaks for stride and stripe-width?


    3. I tried to tune my filesystem by editing fstab. After I did OMV didn't boot anymore. I guess I did something wrong. Could you post your fstab so I know what it should look like?

  • ScaryCanary:


    1. The options are the shortforms of mdadm command. I do not like to use the long form to avoid typing. But here is the longform of the same command:


    Code
    mdadm --create /dev/md0 --metadata=1.2 --raid-devices=4 --chunk=128k --level=5 --name=storage /dev/sdb /dev/sdc /dev/sdd missing


    Where

    • metadata specifies the metadata format (where 1.2 is the current one). I specified it just for makeing sure it is using the latest one.
    • raid-devices is the number of total disks, including paritity disks
    • chunk is the chunk size (stripe size)
    • level specifies the raid level (raid5 here)
    • name simply sets the name of the raid array to create (here storage)


    2. Yes you can still use it.


    The command is in the original post


    3. no issue, here we go


    The relevant entry of my /etc/fstab is:

    Code
    # >>> [openmediavault]
    UUID=ec1576a1-90c8-47a5-bd11-f5bf89dee05e /media/ec1576a1-90c8-47a5-bd11-f5bf89dee05e ext4 defaults,nouser_xattr,noatime,data=writeback,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0 0 2


    mauromol,
    yes I think it could be useful, at least for stride and stripe-width. The other parameters are left at default for a good reason. So the mount options should be used on the purpose you want to use them for. That is the main reason why they are set at the moment as they are. The mount options support the broadest possible usecase, which is perhaps not the best optimized one for you specific scenario.


    However I am not developing OMV, but simply support here in the forums.


    If you use the defaults, then it is a single tune2fs command to make ext4 aware of the settings. That is not too much work.

    Everything is possible, sometimes it requires Google to find out how.

  • Zitat von "SerErris"


    However I am not developing OMV, but simply support here in the forums.


    If you use the defaults, then it is a single tune2fs command to make ext4 aware of the settings. That is not too much work.


    Yes, I know, but I was lucky to find your post in the forum :)
    We may open a ticket in Mantis to add this as a feature request...

  • SerErris,


    1. is it possible to do that without loosing existing data on a RAID 5?
    2. Does the resynching of the parity take again >24 hours like the first time when I build the RAID 5 (4 x 2TB discs)?


    Thanks

  • Zitat

    tune2fs -E stride=128,stripe-width=384 -Odir_index /dev/mapper/UUIDofext4fspartition


    Is the ext4 tuning bit relevant for raid 1 ?
    ...so I could do this with the default settings:

    Code
    tune2fs -E stride=128,stripe-width=128 ...
  • Hi,


    ante_de
    1. No it is not possible to run the mdadm commands, cause the will delete all data (luckily the mdadm command will deny that command if there is an existing raid5 on the disks). You still can use the tune2fs commands and the mount options.
    2. You cannot do this tuning on a running raid5. So this point does not make any sense in your setting. However, every time you create a raid5 it will sync for 24 hours or whatever time it will take.


    anubis:
    No the ext4 tuning itselve is not relevant for raid 1. You do not need the stride and strip-width parameters at all. From the file system perspective, raid 1 is exactly like a single disk. No difference. The tuning is only for Raid0, Raid4, Raid5, Raid6 and Raid10.

    Everything is possible, sometimes it requires Google to find out how.

  • Thank you for posting this guide.
    It is very usefull for me since i am just about to build a RAID6 raid with 6 disks for my new OMV setup.


    My only concern is about the start with a missing drive. What is the purpose of this?


    Will this work even if i start defining my raid with all 6 drives from the start by:

    Code
    mdadm --create /dev/md0 -e 1.2 -n 6 -c 128 -l 6 -N datavol /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg


    Or should i use:

    Code
    mdadm --create /dev/md0 -e 1.2 -n 6 -c 128 -l 6 -N datavol /dev/sdb /dev/sdc /dev/sdd /dev/sde missing missing
  • Dunnow what you should use, but I can tell my experience creting 6 Disk RAID6 array with mdadm.


    I used

    Code
    mdadm --create /dev/md0 --metadata=1.2 --raid-devices=6 --chunk=128 --level=6 --name=storage /dev/disk/by-id/scsi-SATA_XX /dev/disk/by-id/scsi-SATA_XX /dev/disk/by-id/scsi-SATA_xx /dev/disk/by-id/scsi-SATA_xx /dev/disk/by-id/scsi-SATA_xx missing


    Then after build


    Had to manually remove the "spare" and attach it back to raid array


    Code
    mdadm /dev/md0 --remove /dev/sdf


    then

    Code
    mdadm --add /dev/md0 /dev/sdf


    It started building the array
    Then later i did the mdadm add again for the last missing drive.


    Maybe some nix guru can elaborate if that's because the oldish version of mdadm in omv?



    Start with missing drives? Maybe someone is planning to use existing drives that has data, so in this way you can transfer stuff while in degraded state(with missing) and when data is safe you can mdadm add the now empty drive to array



    So in your case put all drives then if you have all available @ start, use the --create with all drives, sda sdb sdc sdd sde sdf or what ever your drives are, in my example create uses physical disk names.

    _-<=[' OMV 1.0.12(Kralizec) ']=>-_
    3.14-0.bpo.1-rt-amd64 kernel
    RAiD 6
    ASuS P5Q Pro

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!