High IO-Wait while copping files with samba

mannebk · 16. April 2017

Hi Folks,

Im runing OMV2 (didnt upgrade since installe because back then there was a bug that broake the system while having automated updates on)

Code

ii  openmediavault                     2.2.4                              all          Open network attached storage solution
ii  openmediavault-downloader          2.1                                all          OpenMediaVault downloader plugin
ii  openmediavault-extplorer           1.2                                all          OpenMediaVault eXtplorer plugin
ii  openmediavault-forkeddaapd         2.0                                all          OpenMediaVault forked-daapd (DAAP server) plugin
ii  openmediavault-keyring             0.4                                all          GnuPG archive keys of the OpenMediaVault archive
ii  openmediavault-omvextrasorg        2.13.2                             all          OMV-Extras.org Package Repositories for OpenMediaVault
ii  openmediavault-openvpn             1.1                                all          OpenVPN plugin for OpenMediaVault.
ii  openmediavault-remoteshare         1.1                                all          remote share plugin for OpenMediaVault.
ii  openmediavault-virtualbox          1.3                                all          VirtualBox plugin for OpenMediaVault.

Thats my System.

Here is a picture of CPU usage out of OMV GUI:

As you could see, IO-Wait is the main CPU usage.

The System is a HP Pro Liant Micro Gen 8 Server.

I do not use the onboard soft raid controler for the data drives. Only the OMV OS runs of a single 2,5 disc thats conected with the on board SATA (raid) controler.
For the data drives: I use a HP P420 hardware raid controler powering 2pc 10TB drives and 2 1TB SSDs as cache and 1GB RAM.

Ive upgradedt the Stock HP Micro Server with the biggest CPU and 16GB RAM.

What caused the load you saw in the pictuer was 3 different SMB coppy jobs only reading form the OMV server.

so I did run IOTOP and found that jbd2/sdb1-/8 is causing the load.

also while VMs are online, VirtaulBox (also the clients are "sleeping") causes a very hi IO Wait

Code

Total DISK READ:      47.90 M/s | Total DISK WRITE:     794.67 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  289 be/3 root        0.00 B/s  249.08 K/s  0.00 % 20.09 % [jbd2/sdb1-8]
62287 be/4 Manne      47.90 M/s    0.00 B/s  0.00 %  5.02 % smbd -D
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init [2]
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]

And also I see a flush-8:16:

Code

Total DISK READ:      46.99 M/s | Total DISK WRITE:    1419.60 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 1390 be/4 root        0.00 B/s    0.00 B/s  0.00 % 80.02 % [flush-8:16]
62287 be/4 Manne      46.99 M/s    0.00 B/s  0.00 %  5.27 % smbd -D
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init [2]
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    6 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    8 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]

and, I was not able to catch it, but while all virtualbox clients where off, I had jbd2/sdb1-8 causing a load of 99,9% in line 1 and smb 99,9% in line 2

for just 4 SMB files transfers read only from OMV

This is what TOP gives me.

Code

top - 12:17:52 up 10 days, 15:15,  2 users,  load average: 0,36, 0,58, 1,16
Tasks: 166 total,   2 running, 163 sleeping,   0 stopped,   1 zombie
%Cpu(s):  0,7 us,  0,7 sy,  0,0 ni, 96,5 id,  2,0 wa,  0,0 hi,  0,2 si,  0,0 st
KiB Mem:  16428680 total, 16241696 used,   186984 free,   356996 buffers
KiB Swap: 12683260 total,        0 used, 12683260 free, 15503696 cached
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
62287 Manne     20   0  125m  29m  18m R   5,7  0,2  78:48.63 smbd
32822 root      20   0 54964  11m 3740 S   1,0  0,1   0:21.66 iotop
26607 openmedi  20   0  105m  27m 2848 S   0,7  0,2   0:21.37 php5-fpm
38865 openmedi  20   0 87580 7084 3028 S   0,7  0,0   0:00.63 php5-fpm
   54 root      20   0     0    0    0 S   0,3  0,0   2:02.28 kworker/0:2
 2699 root      20   0 26120 1876 1016 S   0,3  0,0  17:08.72 cmasm2d
 3736 vbox      20   0  275m  15m 7176 S   0,3  0,1  19:34.01 VBoxSVC
    1 root      20   0 10656  784  648 S   0,0  0,0   0:06.46 init
    2 root      20   0     0    0    0 S   0,0  0,0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S   0,0  0,0   5:07.50 ksoftirqd/0
    6 root      rt   0     0    0    0 S   0,0  0,0   0:17.36 migration/0
    7 root      rt   0     0    0    0 S   0,0  0,0   0:03.81 watchdog/0
    8 root      rt   0     0    0    0 S   0,0  0,0   0:05.20 migration/1
    9 root      20   0     0    0    0 S   0,0  0,0   0:00.00 kworker/1:0
   10 root      20   0     0    0    0 S   0,0  0,0   0:09.02 ksoftirqd/1

Alles anzeigen

any suggestions?

Thanks
Manne

bvrulez · 10. Februar 2019

I also have a high IO and this is probably due to a faulty drive.

mannebk · 11. Februar 2019

maybe with you, but I am sure, my drives are just fine.

first, they where brand new
second, they do duty since years now with no change or fail
third, i see the same on a proxmox machine on a hetzner server as well.

ryecoaaron · 11. Februar 2019

HP p420 raid cards have very picky settings especially with SSDs that can kill performance. I would look into that.

mannebk · 11. Februar 2019

thanks for that hint,
i found this

Zitat von T_VBO

T_VBO Jun 2, 2016 at 8:52 AM
Hello,
This is why is better to use SSD without "ssd internal cache" or most simple.... Server's SSD (datacenter ssd's) like SM/PM863
DC S3510, ...
I do a check with Hp P420 and Samsung 850 Pro ssd's.
The best performances is with smart array cache enabled(FBWC) AND hard drive cache enabled too.

Alles anzeigen

I had the physical drive write cache on disabled.
And by some reason my HP Smart Cache Array for my spinning discs was "off". now its aktivated again. maybe due to chaning the SSD once, and forgot to reaktivate while waiting for migration ill have a look how it performs now. if you dont hear back from me, this fixed it.

Ah, I recall, I added 2 10tb discs, changed from raid 1 to 10, and that rebuild did take plenty of time. so i forgot totally to change my disc size from 9 to 18 tb and reactivate my cache.

thanks folks anyway, since the initial report was with hp smartcache active, as it was before I added thos 2 new discs.

mannebk · 11. Februar 2019

first tests indicate that a 100% write cache is not optimal config.

my hp smart array cache is 2pc of 500gb ssds in raid 0

i do see an io wait on reading from omv at 75mb/s
i nearly dont see io wait on writing to omv at 103mb/s

also i did test this today with a spinning disc client.

Testfile was an 8gb video file

i just repeted it with my ssd powerd notebook quadcore

the perfmon shows an allmost sleeping notebook sucking data in at about 83mb/s from my spinning 18TB array (4x 10tb raid 10)

pushing the data back to the omv array, it runns at about 90% of 1gb link speed with 103mb/s

the io wait increases dramatically if i suck several diferent large file from the array at the same time. (not copy in a row but rather several copy jobs at the verry same moment)

but when i push data from 2 clients to my OMV i now see speeds of about close to 115mb/s thats about the maxed out smb speed for 1gb links. (need to get my 2gb back, but the netgear switches have a firmware bug)

And its with allmost no io wait now, for writing, but now i have about 8-10% soft-irq... ?

As well as having a CPU load spike when the jobs finish. see picture

cheers manne

Jetzt mitmachen!