Errors/Slowness when transferring over ethernet

winskitech · 19. September 2016

My NAS system is a HP Z400 desktop with the following specs:
- Xeon (Dual Core) 2.53 GHz processor
- 16 GB RAM
- 2 x 2 TB WD Red HD's
- 2 x 2 TB WD Black HD's
- 3ware 9650SE-8LPML Raid Controller

All 4 HD's are in a RAID 5 config. I'm booting ESXI 6.0 off a thumb drive. My OMV installation is a thin provisioned VM I gave one core and 5 TB of HD space.

I have several shares created. For my first big transfer I dropped about 60 GB into one of my shares from an external HD hooked up to my Ubuntu machine via smb. The transfer started, 20 MB/s transfer speed down to 0, up to 5, down to 0, up to 5, then I got a time out error. Tried this with several specific smaller folders, got time out errors. Tried transferring via SCP, SFTP and FTP. Same problem every time, connection errors.

In an attempt to further troubleshoot I tried on a Win 10 machine to transfer via SMB and got a "Error 0x8007045D: The request could not be performed because of an I/O device error." Trying a transfer via SCP (using WINSCP) I got time out errors again.

I directly attached the HD to the machine and everything's transferring fine but I'd like to get this network transfer issue figured out. Can anyone shed some light on what's going on here?

Any help is appreciated.

votdev · 19. September 2016

Any useful information in the syslog?

winskitech · 19. September 2016

Several of these:

Sep 19 16:04:15 HoHomv monit[813]: 'HoHomv.local' loadavg(5min) of 4.5 matches resource limit [loadavg(5min)>1.0]
Sep 19 16:04:15 HoHomv monit[813]: 'HoHomv.local' loadavg(1min) of 4.9 matches resource limit [loadavg(1min)>2.0]
Sep 19 16:04:12 HoHomv ool openmediavault-webgui: Unknown or unsupported timezone [tz=US/Eastern]
Sep 19 16:04:14 HoHomv omv-engined[21377]: Unknown or unsupported timezone [tz=US/Eastern]
Sep 19 16:02:38 HoHomv collectd[3309]: rrdcached plugin: rrdc_update (/var/lib/rrdcached/db/localhost/cpu-0/cpu-user.rrd, [1474315346:110356], 1) failed with status -1.
Sep 19 16:02:38 HoHomv collectd[3309]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.

winskitech · 19. September 2016

After doing some Googling I'm not sure what the "Built-in target 'write' " error is. It looks like the "resource limit" error may be linked to not having enough room for the OS, which I'm not sure how that's possible since the OS partition is 42 GB and it's only using 1.18 GiB. I just changed my timezone to a specific city based time zone, so hopefully that clears that up.

Could this be an ethernet driver issue?

votdev · 20. September 2016

Try to fix the RRD database with the omv-firstaid command.

JanN · 20. September 2016

For me it sounds like an ESXI-problem...

BR
Jan

winskitech · 20. September 2016

votdev, I'll run that when I have some time later today.

JanN, what are you thinking as far as ESXI problems go? Is this an issue with storage or maybe a network driver?

winskitech · 20. September 2016

votdev, here's my results:

root@omv# omv-firstaid
Checking all RRD files. Please wait ...
All RRD database files are valid.
Action failed -- Other action already in progress -- please try again later

I'm researching what that means.

winskitech · 20. September 2016

So after I ran the firstaid now my logs are getting spammed these two lines:

Sep 20 13:12:16 omv collectd[3309]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Sep 20 13:12:16 omv collectd[3309]: rrdcached plugin: rrdc_connect (unix:/var/run/rrdcached.sock) failed with status 2.

When I say spammed I mean creating several pages of log files per minute. I'll be looking for a way to fix this or stop the log file ballooning.

winskitech · 20. September 2016

So I found this thread: Mass collectd, rrdcached Errors in syslog

I opened my collectd.conf, found 3 mount points "/" and my two other data stores. None of that looked out of place. To keep the logs from getting out of hand I ran monit stop collectd .

winskitech · 20. September 2016

Ok, just for fun I decided to run sudo omv-firstaid and the process ran through fine this time. Surprised at this result I ran monit restart collectd and watched the syslog file. No new instances of the rrdcached error message came up. Then accessed a share via samba and transferred 2 GB of music files, that ran quick. Pleasantly surprised at this result I dropped 6 GB of music files. After about 1.5 GB the transferred slowed to a crawl then the connection timed out.

Checking syslog I'm getting the following:

Sep 20 15:22:40 omv monit[813]: 'omv.local' cpu wait usage of 97.3% matches resource limit [cpu wait usage>95.0%]
Sep 20 15:22:40 omv monit[813]: 'omv.local' loadavg(5min) of 3.0 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:22:40 omv monit[813]: 'omv.local' loadavg(1min) of 2.8 matches resource limit [loadavg(1min)>2.0]
Sep 20 15:22:45 omv monit[813]: 'nginx' failed protocol test [HTTP] at INET[127.0.0.1:80] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
Sep 20 15:22:45 omv monit[813]: 'nginx' trying to restart
Sep 20 15:22:45 omv monit[813]: 'nginx' stop: /bin/systemctl
Sep 20 15:22:52 omv collectd[17586]: rrdcached plugin: rrdc_update (/var/lib/rrdcached/db/localhost/rrdcached/operations-receive-flush.rrd, [1474399359:200], 1) failed with status -1.
Sep 20 15:22:52 omv collectd[17586]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Sep 20 15:22:52 omv collectd[17586]: rrdcached plugin: rrdc_update (/var/lib/rrdcached/db/localhost/rrdcached/operations-write-updates.rrd, [1474399359:45], 1) failed with status -1.
Sep 20 15:22:52 omv collectd[17586]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Sep 20 15:22:52 omv monit[813]: 'nginx' start: /bin/systemctl
Sep 20 15:23:22 omv monit[813]: 'omv.local' 'omv.local' cpu wait usage check succeeded [current cpu wait usage=28.7%]
Sep 20 15:23:22 omv monit[813]: 'omv.local' loadavg(5min) of 2.9 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:23:22 omv monit[813]: 'omv.local' loadavg(1min) of 2.1 matches resource limit [loadavg(1min)>2.0]
Sep 20 15:23:22 omv monit[813]: 'nginx' connection succeeded to INET[127.0.0.1:80] via TCP
Sep 20 15:23:52 omv monit[813]: 'omv.local' loadavg(5min) of 2.6 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:23:52 omv monit[813]: 'omv.local' 'omv.local' loadavg(1min) check succeeded [current loadavg(1min)=1.3]
Sep 20 15:24:22 omv monit[813]: 'omv.local' loadavg(5min) of 2.4 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:24:52 omv monit[813]: 'omv.local' loadavg(5min) of 2.1 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:25:22 omv monit[813]: 'omv.local' loadavg(5min) of 1.9 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:25:53 omv monit[813]: 'omv.local' loadavg(5min) of 1.7 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:26:23 omv monit[813]: 'omv.local' loadavg(5min) of 1.6 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:26:53 omv monit[813]: 'omv.local' loadavg(5min) of 1.4 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:27:23 omv monit[813]: 'omv.local' loadavg(5min) of 1.3 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:27:53 omv monit[813]: 'omv.local' loadavg(5min) of 1.2 matches resource limit [loadavg(5min)>1.0]
Sep 20 15:28:23 omv monit[813]: 'omv.local' 'omv.local' loadavg(5min) check succeeded [current loadavg(5min)=1.1]

So more resource limit error messages, a different rddcached error message came up and a bunch of loadavg messages. It feels like there's a buffer that's filling up then once it hits the limit everything times out.

winskitech · 20. September 2016

Here's my collectd.conf if that will help:

PIDFile "/var/run/collectd.pid"
Hostname "localhost"
FQDNLookup true
LoadPlugin syslog
<Plugin syslog>
LogLevel info
</Plugin>
LoadPlugin rrdcached
<Plugin rrdcached>
DaemonAddress "unix:/var/run/rrdcached.sock"
DataDir "/var/lib/rrdcached/db/"
CreateFiles true
CollectStatistics true
</Plugin>
LoadPlugin unixsock
<Plugin unixsock>
SocketFile "/var/run/collectd.socket"
SocketGroup "root"
SocketPerms "0660"
</Plugin>
LoadPlugin cpu
LoadPlugin df
<Plugin df>
MountPoint "/"
MountPoint "/media/57c40fe1-f0fc-4e43-934f-7c2cfa55bd99" (RAID 5 Datastore)
MountPoint "/media/9444843044841760" (External hard drive)
IgnoreSelected false
</Plugin>
LoadPlugin interface
<Plugin interface>
Interface "eth0"
IgnoreSelected false
</Plugin>
LoadPlugin load
LoadPlugin memory

winskitech · 20. September 2016

A thought: is it because the CPU on the OMV machine maxes out eventhough it's nowhere near maxed out on my ESXI host? I guess I could see OMV "maxing out" it's CPU (I only gave the VM 1 of the 2 cores I have on my ESXI host) and that killing a transfer after a short period of time?

I'm just spitballing here, I have a little bit of general Linux under my belt but a total OMV noob.

winskitech · 20. September 2016

The mystery deepens. I tried rsync'ing from my ubuntu machine to OMV since it was only method I hadn't attempted yet.

First attempt:
sync -avp 2013\ Tapes/ user@IP address:/media/57c40fe1-f0fc-4e43-934f-7c2cfa55bd99/Music
user@IP address's password:
Could not chdir to home directory /home/user: No such file or directory
sending incremental file list
rsync: failed to set times on "/media/57c40fe1-f0fc-4e43-934f-7c2cfa55bd99/Music/.": Operation not permitted (1)
Some files transferred

sent 492,271,376 bytes received 1,713 bytes 16,687,223.36 bytes/sec
total size is 1,964,934,730 speedup is 3.99
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]

Second attempt:
rsync -avp --info=progress2 2013\ Tapes user@IP address:/media/57c40fe1-f0fc-4e43-934f-7c2cfa55bd99/Music
user@IP address's password:
Could not chdir to home directory /home/user: No such file or directory
sending incremental file list
All files transferred
sent 1,965,443,038 bytes received 5,441 bytes 10,710,890.89 bytes/sec
total size is 1,964,934,730 speedup is 1.00
------------------------------------------------------

I added the info progress2 flag and set it to correctly copy the home folder over that was enough to transfer with no errors. Something I did notice is my transfer speed did go down as the rsync job progressed. I started at 77 MB/s and ended at around 10 MB/s. Also, no entries in the syslog file the entire time the transfer happened.

winskitech · 21. September 2016

So it looks like everything but rsync is giving timeout errors and rsync transfer speed isn't staying steady. That last test was from the HD on my Ubuntu machine to my OMV VM. Next I'll try my USB 3.0 external drive on a machine the supports 3, then rsync that over.

Ideally I'd like more options than rsync but my Googling the syslog error messages hasn't gotten me anywhere.

winskitech · 21. September 2016

Here's my test data:
- One folder roughly 2 GB in size
- 15 subfolders (music albums)
- 279 files

Here's my test conditions:
Test 1: Transfer the data via rsync from the HD on my Ubuntu machine to my OMV VM
Test 2: Transfer the data via rsync from USB 3.0 HD plugged into another machine to my OMV VM

Results:
Test 1: Starting speed around 70 MB/s, Ending speed around 10 MB/s
Test 2: Starting speed around 55 MB/s. Ending speed around 15 MB/s

Conclusion: I'm guessing the slow down in speed has to do with the way the data is transferred to the disk, not the protocol. I'm also guessing this could be tuned. That's something I'd like to tackle, but I'd like the ability to share via SCP and Samba first.

Any suggestions or solutions on any of these issues is appreciated.

winskitech · 21. September 2016

So after the rsync transfers I just decided to try scp...and that worked. And so did ftp and samba...so I have no idea what's going on. I mean I'm happy that it seems to be working now, but I don't know what was blocking it from working in the first place so if it breaks again I won't know how to fix it.

During the samba transfer and the scp transfer I'm still getting the 'computer name' loadavg (Xmin) of X.X matches resource limit [loadavg(Xmin)>X.X] error messages but it finishes. Also, during the Samba transfer the transfer speed would fluctuate between 20 MB/s and under 1 MB/s.

My two question lines remain. Question #1: Why was I getting all of these error messages when trying to transfer any method other than rsync? And I guess why would it be working now? Question #2: What transfer speeds should I be seeing? Is the speed fluctuations I'm seeing normal?

votdev · 21. September 2016

Sorry, i have no answers to your questions.

JanN · 23. September 2016

Zitat von winskitech

JanN, what are you thinking as far as ESXI problems go? Is this an issue with storage or maybe a network driver?

There is obviously a malfunction that generates a lot of log entries, what eats ressources. Normally even a VM on the described hardware should be able to do this job without nearly freezing. That still make me think, that it is an ESXI-problem with I/O and/or the NIC-driver and/or cpu-scheduling or whatever.
Did you try a bare metal installation of OMV on your machine? If that works as expected, you can use the Virtualbox-PlugIn to run other VMs on the box under OMV (Debian) as the Hostsystem - i run two VB-VMs with Windows7 on my much slower N54L Microserver this way...

BR
Jan

winskitech · 23. September 2016

Jan, I don't seem to be getting errors now on any transfer protocol I use. I guess the omv-firstaid worked eventhough I was still getting errors for a while after running the command.

Errors/Slowness when transferring over ethernet

Jetzt mitmachen!

Tags