md0_raid6 + IO wait 50% + load avg 9.0 = ???

  • I'm having an odd problem with my OMV 2.1 box where the load average is pegged at 9.0 or greater and the box is extremely slow. About half the CPU is tied up in IO wait (wa is 49.5) and half is idle (48.0). There is no single process taking up more than 3% of the CPU. In fact, md0_raid6 is the top CPU consumer and it is only 1% or 2%.


    I don't see any evidence the array is rebuilding, unless I don't understand how to interpret the output of mdam.


    Any thoughts from the resident experts?


    top - 18:32:10 up 10:41, 3 users, load average: 9.34, 9.13, 9.07
    Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 0.0 us, 2.0 sy, 0.0 ni, 48.0 id, 49.5 wa, 0.0 hi, 0.5 si, 0.0 st
    KiB Mem: 7865624 total, 7683392 used, 182232 free, 331456 buffers
    KiB Swap: 4789244 total, 0 used, 4789244 free, 6790720 cached


    # mdadm -D /dev/md0
    /dev/md0:
    Version : 1.2
    Creation Time : Sat Sep 26 10:14:10 2015
    Raid Level : raid6
    Array Size : 19534435840 (18629.49 GiB 20003.26 GB)
    Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
    Raid Devices : 7
    Total Devices : 7
    Persistence : Superblock is persistent


    Update Time : Tue Jun 14 18:58:13 2016
    State : active
    Active Devices : 7
    Working Devices : 7
    Failed Devices : 0
    Spare Devices : 0


    Layout : left-symmetric
    Chunk Size : 512K


    Name : omv-6th:omv18T (local to host omv-6th)
    UUID : 363e30b4:3db6c23d:5c6d3a5a:47907af9
    Events : 196


    Number Major Minor RaidDevice State
    0 8 64 0 active sync /dev/sde
    1 8 80 1 active sync /dev/sdf
    2 8 96 2 active sync /dev/sdg
    3 8 128 3 active sync /dev/sdi
    4 8 144 4 active sync /dev/sdj
    5 8 160 5 active sync /dev/sdk
    6 8 176 6 active sync /dev/sdl

    First exposure to UNIX was a Slackware install from fifty eight 3.5" floppy disks created in a RS/6000 running AIX 3.1.2. I was surrounded by X-Windows, Motif GUI, NFS, distributed file systems, token-ring and I "upgraded" to Windows 3.1.

  • iotop shows me this. Looks like jbd2 (ext4 journal) is writing all the time.


    Total DISK READ: 0.00 B/s | Total DISK WRITE: 593.75 K/s
    TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
    2214 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.69 % [jbd2/md0-8]
    3109 be/4 root 0.00 B/s 39.06 K/s 0.00 % 14.12 % [nfsd]
    3111 be/4 root 0.00 B/s 46.88 K/s 0.00 % 13.67 % [nfsd]
    3110 be/4 root 0.00 B/s 42.97 K/s 0.00 % 10.08 % [nfsd]
    3115 be/4 root 0.00 B/s 39.06 K/s 0.00 % 9.99 % [nfsd]
    3112 be/4 root 0.00 B/s 42.97 K/s 0.00 % 9.56 % [nfsd]
    3108 be/4 root 0.00 B/s 50.78 K/s 0.00 % 8.62 % [nfsd]
    3114 be/4 root 0.00 B/s 50.78 K/s 0.00 % 8.19 % [nfsd]
    3113 be/4 root 0.00 B/s 42.97 K/s 0.00 % 7.39 % [nfsd]
    371 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [jbd2/sda1-8]
    1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init [2]
    2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
    3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
    6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
    7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
    8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
    10 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1

    First exposure to UNIX was a Slackware install from fifty eight 3.5" floppy disks created in a RS/6000 running AIX 3.1.2. I was surrounded by X-Windows, Motif GUI, NFS, distributed file systems, token-ring and I "upgraded" to Windows 3.1.

  • Well, well, well. I stopped NFS and bingo, my system load dropped to 1.0.


    Why was NFS killing me here? I had unmounted all remote filesystems.


    Looks like I have more debugging to do.

    First exposure to UNIX was a Slackware install from fifty eight 3.5" floppy disks created in a RS/6000 running AIX 3.1.2. I was surrounded by X-Windows, Motif GUI, NFS, distributed file systems, token-ring and I "upgraded" to Windows 3.1.

    • Offizieller Beitrag

    What kind of system? Are you mounting nfs manually or using remoteshare plugin? The array doesn't seem to be rebuilding but OMV does use lazy formatting for EXT4. So, that might be an issue especially on an array that size.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • I have yet to figure out why NFS was causing this, but I've only started debugging the issue. The NFS mount was between two OMV systems, but they were at different patch levels. I believe one was at 2.2 and the other at 2.1, though I'm not 100% positive because I don't know how to determine the version of OMV on the box.

    First exposure to UNIX was a Slackware install from fifty eight 3.5" floppy disks created in a RS/6000 running AIX 3.1.2. I was surrounded by X-Windows, Motif GUI, NFS, distributed file systems, token-ring and I "upgraded" to Windows 3.1.

    • Offizieller Beitrag

    When you login, the system information panel tells you the OMV version. Otherwise, dpkg -l | grep openm will.


    You didn't say what type of system but I would also install omv-extras and use the Install backport kernel button. The 3.16 kernel will give you better drivers.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Thanks, I found the version on the system information panel. One was 2.1 Stoneburner unpatched (i.e.; fresh install) and 2.2.4 Stoneburner. Both systems have outstanding patches, so I'm going to get them on the same level and then dig into my NFS settings. I did a basic mount with no options which is probably not a good idea.


    My goal here is to rsync the two systems each night so I have an extra copy of all my data. I started out using rsync over the wire with both ssh and rsh as options, but it left my more slow system CPU bound. Then I switched to rsync over NFS which performed beautifully but overnight it started causing problems.

    First exposure to UNIX was a Slackware install from fifty eight 3.5" floppy disks created in a RS/6000 running AIX 3.1.2. I was surrounded by X-Windows, Motif GUI, NFS, distributed file systems, token-ring and I "upgraded" to Windows 3.1.

    • Offizieller Beitrag

    Why not setup an rsync job on the source system and enable rsync server/create a module on the destination system? That is what I do and no mounts necessary :) Works very well.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!