less than 30 Mb/s ZFS Raidz2 - Analyze

maltejahn · 24. November 2023

Hi,

probably one topic alot of people have. Yes, its not the newest system. But still i think (and hope) there is something wrong:

System:

- Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz

- 32GB ECC memory, no min/max ZFS settings

- MSI C236A, latest BIOS

- SATA Controller ASM1166 - also tried onboard connector only, but was the same

I started with a ZFS consisting of SMR + CMR disks -> i thought this is/was the problem as a lot of people mention to get rid of SMR. So i replaced the SMR disks with CMR only

Today:

Code

root@omv-neu:~# zpool status
  pool: NAS-Daten
 state: ONLINE
  scan: resilvered 2.71T in 09:47:14 with 0 errors on Fri Nov 24 08:11:26 2023
config:

        NAME                                          STATE     READ WRITE CKSUM
        NAS-Daten                                     ONLINE       0     0     0
          raidz1-0                                    ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6  ONLINE       0     0     0
            ata-ST4000VN008-2DR166_ZDH2WSEQ           ONLINE       0     0     0
            ata-ST4000VN008-2DR166_ZDH2WRSB           ONLINE       0     0     0
            ata-ST4000VN006-3CW104_ZW603QV2           ONLINE       0     0     0

Alles anzeigen

Now, the "error" is still the same

- e.g. copying 37 Gig ( 3 Gb files each) form one to another dataset: starts fast, drop to 110 MB/s and the continues with 30 MB/s with some drops to << 10 MB/s. Sometimes it stops for multiple seconds. Overall it took 21 minutes (30 MB/s)

- cant see any SMART Errors (all green)

- SMART Temperature below 30 °C

- i used a script to measure before (with 2 SMR disks ( WD 4 TB EFAX)). With them replaced by ST4000VN008... disk, it get better. But still a major drop after a few GB

Code

   /usr/bin/time -f "%e" sh -c 'dd if=/dev/zero of=/srv/NAS-Daten/20GB.img bs=20M count=1000 2> /dev/null'
with SMR+CMR mix 196s, CMR only 64s

- no dmesg errors

- no compression activated, no dedup

- no hot temperaturen using sensors

- same happens if i copy a large amounts of data using 1Gb Ethernet

- bad sata cables - dont think so because it starts good

- bad cooling: HDD cages have fans, CPU + housing also

-> how to determine the Problem?

As the system is quite low in free space, i consider to by new disk soon**. But aslong i dont know what the problem with the system is, i postpone this

- Mainboard issue: datatransfer starts promissing, but collaps after 10G or so: No Mainboard/CPU/Bridge Problem?

- sata controller issue: also used onboard Sata without any major changes. Would be strange if both onboard and PCIe Controller do the same strange thing

- temperature problem: Not the disks i think (SMART value ok)

- replace raidz2.... no idea

- **get rid or raidz2 and replace by mirror of two Seagate Exos X20, 18TB

iostat with average over 60s when copying 37 gig test data: directly after i start the copy

Code

root@omv-neu:/srv/NAS-Daten# zpool iostat -vly 60 1
                          capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                          alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
--------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
NAS-Daten                                     10.9T  3.63T    252  1.19K  25.8M   111M   87ms   14ms   13ms   13ms  853ms  852ns   73ms    1ms      -      -
  raidz1-0                                    10.9T  3.63T    252  1.19K  25.8M   111M   87ms   14ms   13ms   13ms  853ms  852ns   73ms    1ms      -      -
    ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6      -      -     82    151  8.59M  28.1M   15ms   41ms    6ms   36ms  201ms    2us    9ms    5ms      -      -
    ata-ST4000VN008-2DR166_ZDH2WSEQ               -      -     80    384  8.28M  27.6M   13ms    5ms    6ms    4ms  150ms  480ns    7ms  601us      -      -
    ata-ST4000VN008-2DR166_ZDH2WRSB               -      -     54    399  4.56M  28.1M   22ms    4ms    8ms    3ms  120ms  432ns   14ms  243us      -      -
    ata-ST4000VN006-3CW104_ZW603QV2               -      -     34    280  4.37M  27.5M  539ms   28ms   53ms   26ms     1s  384ns  478ms    2ms      -      -
--------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

same copy command after approx. 5 minutes to see if something changed

root@omv-neu:~# zpool iostat -vly 60 1
                                                capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                          alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
--------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
NAS-Daten                                     10.9T  3.60T    755  1.10K  53.7M  63.0M   32ms   14ms    9ms   13ms   11ms  432ns   23ms    1ms      -      -
  raidz1-0                                    10.9T  3.60T    755  1.10K  53.7M  63.0M   32ms   14ms    9ms   13ms   11ms  432ns   23ms    1ms      -      -
    ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6      -      -    184    148  12.2M  16.1M   12ms   53ms    7ms   47ms    1us  576ns    4ms    5ms      -      -
    ata-ST4000VN008-2DR166_ZDH2WSEQ               -      -    195    331  11.8M  15.1M    8ms    4ms    6ms    4ms    1us  384ns    2ms  237us      -      -
    ata-ST4000VN008-2DR166_ZDH2WRSB               -      -    229    333  14.9M  16.1M    6ms    3ms    5ms    3ms  925ns  384ns    1ms  142us      -      -
    ata-ST4000VN006-3CW104_ZW603QV2               -      -    145    312  14.8M  15.8M  130ms   18ms   21ms   17ms   44ms  384ns  109ms  715us      -      -
--------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

Alles anzeigen

bonnie++ with the zfs arrangement above + iostat during bonnie++

Code

bonnie++ -u root -r 1024 -s 16384 -d /srv/NAS-Daten/ -f -b -n 1 -c 4
Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
omv-neu      16G::4            238m  24 67.4m  12            953m  48 476.1   9
Latency                       25064us    2532ms               389ms     606ms
Version  2.00       ------Sequential Create------ --------Random Create--------
omv-neu             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1  1024   0 +++++ +++  1024   0  1024   0 +++++ +++  1024   0
Latency               157ms       3us     196ms     210ms      17us     153ms
1.98,2.00,omv-neu,4,1701438657,16G,,8192,5,,,243517,24,69014,12,,,976010,48,476.1,9,1,,,,,79,0,+++++,+++,71,0,73,0,+++++,+++,73,0,,25064us,2532ms,,389ms,606ms,157ms,3us,196ms,210ms,17us,153ms
 with



root@omv-neu:/srv/NAS-Daten# zpool iostat -vly 120 1
                                                capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                          alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
--------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
NAS-Daten                                     10.9T  3.61T    536  2.56K  47.5M   245M   13ms   54ms    5ms    8ms    6ms     4s    7ms  814us      -      -
  raidz1-0                                    10.9T  3.61T    536  2.56K  47.5M   245M   13ms   54ms    5ms    8ms    6ms     4s    7ms  814us      -      -
    ata-WDC_WD40EFRX-68N32N0_WD-WCC7K2JXNNR6      -      -    145    224  15.9M  62.0M   15ms  237ms    6ms   47ms    8ms     5s    8ms    6ms      -      -
    ata-ST4000VN008-2DR166_ZDH2WSEQ               -      -    157    791  15.3M  59.3M   10ms   39ms    5ms    5ms    8ms     3s    4ms  416us      -      -
    ata-ST4000VN008-2DR166_ZDH2WRSB               -      -    114    814  8.17M  62.1M   15ms   35ms    6ms    4ms  452us     3s    9ms  253us      -      -
    ata-ST4000VN006-3CW104_ZW603QV2               -      -    119    796  8.16M  62.1M   13ms   35ms    5ms    4ms    7ms     3s    8ms  246us      -      -
--------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

Alles anzeigen

maltejahn · 24. November 2023

I "found" another top... atop:

sdd+sde -> busy

sdd: WDC WD40EFRX-68N32N0

sde: ST4000VN006-3CW104

raulfg3 · 24. November 2023

try to google about ZFS tune, some can increase speed, and same for SMB Tune

Chris's Wiki :: blog/solaris/ZFSScrubsOurSpeedup

Performance Tuning for SMB File Servers

learn.microsoft.com

maltejahn · 24. November 2023

Hi,

i did some further tests:

copy from a nvme to a ssd using either onboard or pcie card: close to 500 Mbyte/s

copy from nvme to a SMR Disk (the one i replaced)-> 130 MByte/s

-> no HW issue

-> yes, looks like wrong zfs settings

I do a scrub right now after I did the replace+resilver

After a few minutes it says

Code

2.02T scanned at 4.46G/s, 120G issued at 265M/s, 11.4T total

265 M/s? Lets see later

But for now

Ashift:

Code

get disk value:
sudo smartctl -a /dev/sda | grep 'Sector Size'
for all members sda/sdb/sdd/sde same output
512 bytes logical, 4096 bytes physical
-> 2^12
zpool get all | grep ashift
NAS-Daten  ashift                         12                             local

So the setting for the pool seems to be ok

"Common" settings

Code

NAS-Daten  atime                 off                                   local
NAS-Daten  xattr                 sa                                    local


Settings user access,...
NAS-Daten  xattr                 sa                                    local
NAS-Daten  acltype               posix                                 local
NAS-Daten  aclinherit            passthrough                           local

"Compression"

Code

root@omv-neu:~# zfs get all NAS-Daten | grep compress
NAS-Daten  compressratio         1.00x                                 -
NAS-Daten  compression           off                                   default
NAS-Daten  refcompressratio      1.00x                                 -

I set it to off. With compression data transfer should be better. I am looking to get closer to 1GBit/s 100 MB/s. Aslong as i am as far away from that, I dont want to copy all data again so that it will be compressed. Also my Data are most Pictures and Vides. I dont expect that huge effect on compression

Cache/Log: Not right now

Kernel for OMV

- i am using Debian GNU/Linux, with Linux 6.1.15-1-pve

- once i wanted to use the server also for VMs, but i do this now on another PC

Memory: The more the better. But does it help in that case?

-> Arc Min/Max untouched

Code

/sys/module/zfs/parameters/zfs_arc_max -> 0

ZFS Sync - often mentioned, untouched (not always what could slow down pool)

Code

NAS-Daten  sync                  standard                              default

maltejahn · 6. Dezember 2023

Hi,

after a few days I wanted to tell about the solution for my performance problem. I tried what most people suggested..... Avoid onboard SATA. The problems where atleast because i mixed onboard an additional sata controller ports.

After i connected all of the disks to the ASM1166 6 port expander, I get really got good write values. One issue was a "write" error on one Disk when writing alot of Data. With this, the ZFS was degraded. It seems it was the cable (as also mentioned using google).

For now, with the 4 disks i get (only tested a few times, 8 mp4 files, total 37GB, no compression):

NVME -> ZFS -> 250MB/s

NVME-> ZFS with 1MB recordsize -> 370 MB/s

SSD 870 -> ZFS with 1MB recordsize + NVME log (for testing, removed it then) -> 390 MB/s

For the movie dataset i will probably set recordsize to 256kb as there are also alot of small files for every movie. And with 1MB i think there would be alot of overhead/wasted space

For Pictures i will do the same, the rest stays the same.

less than 30 Mb/s ZFS Raidz2 - Analyze

votdev 24. November 2023

maltejahn 29. November 2023

Jetzt mitmachen!

Tags