Hi,
probably one topic alot of people have. Yes, its not the newest system. But still i think (and hope) there is something wrong:
System:
- Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz
- 32GB ECC memory, no min/max ZFS settings
- MSI C236A, latest BIOS
- SATA Controller ASM1166 - also tried onboard connector only, but was the same
I started with a ZFS consisting of SMR + CMR disks -> i thought this is/was the problem as a lot of people mention to get rid of SMR. So i replaced the SMR disks with CMR only
Today:
root@omv-neu:~# zpool status
pool: NAS-Daten
state: ONLINE
scan: resilvered 2.71T in 09:47:14 with 0 errors on Fri Nov 24 08:11:26 2023
config:
NAME STATE READ WRITE CKSUM
NAS-Daten ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZDH2WSEQ ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZDH2WRSB ONLINE 0 0 0
ata-ST4000VN006-3CW104_ZW603QV2 ONLINE 0 0 0
Display More
Now, the "error" is still the same
- e.g. copying 37 Gig ( 3 Gb files each) form one to another dataset: starts fast, drop to 110 MB/s and the continues with 30 MB/s with some drops to << 10 MB/s. Sometimes it stops for multiple seconds. Overall it took 21 minutes (30 MB/s)
- cant see any SMART Errors (all green)
- SMART Temperature below 30 °C
- i used a script to measure before (with 2 SMR disks ( WD 4 TB EFAX)). With them replaced by ST4000VN008... disk, it get better. But still a major drop after a few GB
/usr/bin/time -f "%e" sh -c 'dd if=/dev/zero of=/srv/NAS-Daten/20GB.img bs=20M count=1000 2> /dev/null'
with SMR+CMR mix 196s, CMR only 64s
- no dmesg errors
- no compression activated, no dedup
- no hot temperaturen using sensors
- same happens if i copy a large amounts of data using 1Gb Ethernet
- bad sata cables - dont think so because it starts good
- bad cooling: HDD cages have fans, CPU + housing also
-> how to determine the Problem?
As the system is quite low in free space, i consider to by new disk soon**. But aslong i dont know what the problem with the system is, i postpone this
- Mainboard issue: datatransfer starts promissing, but collaps after 10G or so: No Mainboard/CPU/Bridge Problem?
- sata controller issue: also used onboard Sata without any major changes. Would be strange if both onboard and PCIe Controller do the same strange thing
- temperature problem: Not the disks i think (SMART value ok)
- replace raidz2.... no idea
- **get rid or raidz2 and replace by mirror of two Seagate Exos X20, 18TB
iostat with average over 60s when copying 37 gig test data: directly after i start the copy
root@omv-neu:/srv/NAS-Daten# zpool iostat -vly 60 1
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
-------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
NAS-Daten 10.9T 3.63T 252 1.19K 25.8M 111M 87ms 14ms 13ms 13ms 853ms 852ns 73ms 1ms - -
raidz1-0 10.9T 3.63T 252 1.19K 25.8M 111M 87ms 14ms 13ms 13ms 853ms 852ns 73ms 1ms - -
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6 - - 82 151 8.59M 28.1M 15ms 41ms 6ms 36ms 201ms 2us 9ms 5ms - -
ata-ST4000VN008-2DR166_ZDH2WSEQ - - 80 384 8.28M 27.6M 13ms 5ms 6ms 4ms 150ms 480ns 7ms 601us - -
ata-ST4000VN008-2DR166_ZDH2WRSB - - 54 399 4.56M 28.1M 22ms 4ms 8ms 3ms 120ms 432ns 14ms 243us - -
ata-ST4000VN006-3CW104_ZW603QV2 - - 34 280 4.37M 27.5M 539ms 28ms 53ms 26ms 1s 384ns 478ms 2ms - -
-------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
same copy command after approx. 5 minutes to see if something changed
root@omv-neu:~# zpool iostat -vly 60 1
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
-------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
NAS-Daten 10.9T 3.60T 755 1.10K 53.7M 63.0M 32ms 14ms 9ms 13ms 11ms 432ns 23ms 1ms - -
raidz1-0 10.9T 3.60T 755 1.10K 53.7M 63.0M 32ms 14ms 9ms 13ms 11ms 432ns 23ms 1ms - -
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6 - - 184 148 12.2M 16.1M 12ms 53ms 7ms 47ms 1us 576ns 4ms 5ms - -
ata-ST4000VN008-2DR166_ZDH2WSEQ - - 195 331 11.8M 15.1M 8ms 4ms 6ms 4ms 1us 384ns 2ms 237us - -
ata-ST4000VN008-2DR166_ZDH2WRSB - - 229 333 14.9M 16.1M 6ms 3ms 5ms 3ms 925ns 384ns 1ms 142us - -
ata-ST4000VN006-3CW104_ZW603QV2 - - 145 312 14.8M 15.8M 130ms 18ms 21ms 17ms 44ms 384ns 109ms 715us - -
-------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Display More
bonnie++ with the zfs arrangement above + iostat during bonnie++
bonnie++ -u root -r 1024 -s 16384 -d /srv/NAS-Daten/ -f -b -n 1 -c 4
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
omv-neu 16G::4 238m 24 67.4m 12 953m 48 476.1 9
Latency 25064us 2532ms 389ms 606ms
Version 2.00 ------Sequential Create------ --------Random Create--------
omv-neu -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
1 1024 0 +++++ +++ 1024 0 1024 0 +++++ +++ 1024 0
Latency 157ms 3us 196ms 210ms 17us 153ms
1.98,2.00,omv-neu,4,1701438657,16G,,8192,5,,,243517,24,69014,12,,,976010,48,476.1,9,1,,,,,79,0,+++++,+++,71,0,73,0,+++++,+++,73,0,,25064us,2532ms,,389ms,606ms,157ms,3us,196ms,210ms,17us,153ms
with
root@omv-neu:/srv/NAS-Daten# zpool iostat -vly 120 1
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
-------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
NAS-Daten 10.9T 3.61T 536 2.56K 47.5M 245M 13ms 54ms 5ms 8ms 6ms 4s 7ms 814us - -
raidz1-0 10.9T 3.61T 536 2.56K 47.5M 245M 13ms 54ms 5ms 8ms 6ms 4s 7ms 814us - -
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6 - - 145 224 15.9M 62.0M 15ms 237ms 6ms 47ms 8ms 5s 8ms 6ms - -
ata-ST4000VN008-2DR166_ZDH2WSEQ - - 157 791 15.3M 59.3M 10ms 39ms 5ms 5ms 8ms 3s 4ms 416us - -
ata-ST4000VN008-2DR166_ZDH2WRSB - - 114 814 8.17M 62.1M 15ms 35ms 6ms 4ms 452us 3s 9ms 253us - -
ata-ST4000VN006-3CW104_ZW603QV2 - - 119 796 8.16M 62.1M 13ms 35ms 5ms 4ms 7ms 3s 8ms 246us - -
-------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Display More