md0_raid6 + IO wait 50% + load avg 9.0 = ???

RedneckBob · 15. Juni 2016

I'm having an odd problem with my OMV 2.1 box where the load average is pegged at 9.0 or greater and the box is extremely slow. About half the CPU is tied up in IO wait (wa is 49.5) and half is idle (48.0). There is no single process taking up more than 3% of the CPU. In fact, md0_raid6 is the top CPU consumer and it is only 1% or 2%.

I don't see any evidence the array is rebuilding, unless I don't understand how to interpret the output of mdam.

Any thoughts from the resident experts?

top - 18:32:10 up 10:41, 3 users, load average: 9.34, 9.13, 9.07
Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 2.0 sy, 0.0 ni, 48.0 id, 49.5 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem: 7865624 total, 7683392 used, 182232 free, 331456 buffers
KiB Swap: 4789244 total, 0 used, 4789244 free, 6790720 cached

# mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sat Sep 26 10:14:10 2015
Raid Level : raid6
Array Size : 19534435840 (18629.49 GiB 20003.26 GB)
Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
Raid Devices : 7
Total Devices : 7
Persistence : Superblock is persistent

Update Time : Tue Jun 14 18:58:13 2016
State : active
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : omv-6th:omv18T (local to host omv-6th)
UUID : 363e30b4:3db6c23d:5c6d3a5a:47907af9
Events : 196

Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 80 1 active sync /dev/sdf
2 8 96 2 active sync /dev/sdg
3 8 128 3 active sync /dev/sdi
4 8 144 4 active sync /dev/sdj
5 8 160 5 active sync /dev/sdk
6 8 176 6 active sync /dev/sdl

RedneckBob · 15. Juni 2016

iotop shows me this. Looks like jbd2 (ext4 journal) is writing all the time.

Total DISK READ: 0.00 B/s | Total DISK WRITE: 593.75 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
2214 be/3 root 0.00 B/s 0.00 B/s 0.00 % 96.69 % [jbd2/md0-8]
3109 be/4 root 0.00 B/s 39.06 K/s 0.00 % 14.12 % [nfsd]
3111 be/4 root 0.00 B/s 46.88 K/s 0.00 % 13.67 % [nfsd]
3110 be/4 root 0.00 B/s 42.97 K/s 0.00 % 10.08 % [nfsd]
3115 be/4 root 0.00 B/s 39.06 K/s 0.00 % 9.99 % [nfsd]
3112 be/4 root 0.00 B/s 42.97 K/s 0.00 % 9.56 % [nfsd]
3108 be/4 root 0.00 B/s 50.78 K/s 0.00 % 8.62 % [nfsd]
3114 be/4 root 0.00 B/s 50.78 K/s 0.00 % 8.19 % [nfsd]
3113 be/4 root 0.00 B/s 42.97 K/s 0.00 % 7.39 % [nfsd]
371 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [jbd2/sda1-8]
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init [2]
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
10 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1

RedneckBob · 15. Juni 2016

Well, well, well. I stopped NFS and bingo, my system load dropped to 1.0.

Why was NFS killing me here? I had unmounted all remote filesystems.

Looks like I have more debugging to do.

ryecoaaron · 15. Juni 2016

What kind of system? Are you mounting nfs manually or using remoteshare plugin? The array doesn't seem to be rebuilding but OMV does use lazy formatting for EXT4. So, that might be an issue especially on an array that size.

RedneckBob · 15. Juni 2016

I have yet to figure out why NFS was causing this, but I've only started debugging the issue. The NFS mount was between two OMV systems, but they were at different patch levels. I believe one was at 2.2 and the other at 2.1, though I'm not 100% positive because I don't know how to determine the version of OMV on the box.

ryecoaaron · 15. Juni 2016

When you login, the system information panel tells you the OMV version. Otherwise, dpkg -l | grep openm will.

You didn't say what type of system but I would also install omv-extras and use the Install backport kernel button. The 3.16 kernel will give you better drivers.

RedneckBob · 17. Juni 2016

Thanks, I found the version on the system information panel. One was 2.1 Stoneburner unpatched (i.e.; fresh install) and 2.2.4 Stoneburner. Both systems have outstanding patches, so I'm going to get them on the same level and then dig into my NFS settings. I did a basic mount with no options which is probably not a good idea.

My goal here is to rsync the two systems each night so I have an extra copy of all my data. I started out using rsync over the wire with both ssh and rsh as options, but it left my more slow system CPU bound. Then I switched to rsync over NFS which performed beautifully but overnight it started causing problems.

ryecoaaron · 17. Juni 2016

Why not setup an rsync job on the source system and enable rsync server/create a module on the destination system? That is what I do and no mounts necessary Works very well.

Jetzt mitmachen!