SSD vs HDD performance - strange behavior when moving large files over 10G network

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • SSD vs HDD performance - strange behavior when moving large files over 10G network

      SSD raid array performance is worse than HDD array when moving large files over network

      Clients: Windows 10 VM's running inside Proxmox hosts; Jumbo frames turned on and connected through 10G SFP+

      OMV: dual CPU 2 x x5670 (2.9 GHz - 24 threads) 48 gigs of ram and connected over 10G SFP+
      - HDD setup: 12 x 4TB in a raid 50 setup using mdadm - XFS file system

      Moving files from windows clients to OMV HDD array over cifs/smb works great with speeds around 800 MB/s +; Even small random file transfers work pretty well.

      I also have 8 x 250GB samsung 850 pro SSD's, which I have been playing around with.

      However, the performance is worse from the SSD's vs HDD

      I have setup single drives, raid0, raid5, raid50 ..... with xfs file system and I keep seeing the following:
      - Moving files around 6 - 9 GB, works without issues: speeds same as HDD array
      - However as files go above 9GB in size the transfer drops like a rock. On a 4 disk raid 0 I see speeds drop to 130 MB/s, but if I run single disk it goes to 30MB/s .....
      - I also see my IOWAIT and system load go up as compared to HDD array ..... system load = 8+ and IOWAIT 4% to 12% (moving files to the HDD array does not really change the system load or IOWAIT)

      I have even tried ZFS and the performance is even worse.
      changed the SAS controller, but that did not make a difference. SAS controller = LSI 9211-8i HBA fully updated and in IT mode

      Has anyone run into anything like this or know what else I should look at?
    • Reduce complexity.

      Your report reads as network is irrelevant and problems start to appear as soon as you write +9GiB to such an SSD and in a single drive configuration sequential write speeds drop below 30 MB/s?

      True?

      Then test this isolated first. There's fio and iozone (package iozone3). Write a 12 GB file with

      Source Code

      1. iozone -e -I -a -s 12000M -r 1024k -r 16384k -i 0 -i 1

      And watch what's happening with

      Source Code

      1. iostat 30
      in parallel ('apt install sysstat'). If write performance drops after ~9GB you need to search for the reason why (since Samsung 850 Pro with at least 256 GB remain at +500 MB/s on a single SATA port all the time, only the smaller 128GB variants get slower after TurboWrite buffers are filles)
    • vshaulsk wrote:

      Well I ran what you suggested

      Why do you post pictures showing some Word contents with strange formatting? And not text in code blocks?

      Anyway: the write numbers are horribly low. With 1 MB test size just 100MB/s with the HDDs and 200MB/s with an SSD RAID-0. No idea why the numbers are that low but now you know it's your storage setup.
    • tkaiser wrote:

      watch what's happening with iostat 30 in parallel
      This was there for a reason: to get a clue whether write performance is ok in the beginning to drop later or if performance is low all the time.

      You see nice performance numbers when network is involved just due to Linux buffering stuff in RAM and performance drops once real storage is involved.
    • I did run IOSTAT 30 at the same time and the write speed was constant, no big spike in the beginning.

      Just stayed around the 200 MB/s mark the hole time for the 1M test size


      I ran the same test on my R710, which has a raid 10 (6 x 1TB) and the output was 574 MB/s for the 1 MB test size and 541 MB/s for the 16 MB test size

      Any clue what I should be looking at next ?
    • vshaulsk wrote:

      Any clue what I should be looking at next ?

      Unfortunately not. I try to avoid dealing with the lowest hardware level (it's better if others get alerts of failing disks and such stuff ;) ). Currently setting up a 16 SATA HDD ZFS filer with all disks behind an LSI SAS3004 + SAS Expander. Performance doesn't matter that much since Archive Storage but as RAIDz3 it's +700 MB/s sequential performance. So no... really no idea.
    • vshaulsk wrote:

      fully updated and in IT mode

      Has anyone run into anything like this or know what else I should look at?
      I kept an older firmware version on my 9211-8i because I read (can't remember where) that performance was much better on the older firmware versions when using Linux.
      omv 4.1.13 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.13
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • vshaulsk wrote:

      Do you know which firmware version ?
      I was trying to look that up but my system is running ESXi and I don't see it. It is years old and for some reason P15 seems familiar (don't count on it).
      omv 4.1.13 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.13
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • So I tried an older firmware

      version 14 in IT mode, but I am still experiencing the same issue

      I have also tried pulling two of 3 HBA's and just leaving one connected to the SSD array.
      - reset the bios to factory settings

      One last thing I was going to try is a fresh install of OMV on a new disk and just see what the performance is.
      - no addons or modifications ......... perhaps I did something as I was messing with this system
    • Going to the basics is my next step.

      1) fresh basic install of OMV 4 .... maybe even try OMV 3 just for comparison
      2) test SSD drive attached to sata II port - if OK move to step 3 ....... if not OK, motherboard/bios problem ?
      3) test SSD drive attached to LSI 9211-8i HBA 6Gb/s - if not working properly move to step 4
      4) test SSD drive attached to LSI HBA 3GB/s from my previous build

      If 3 and 4 fail on all pci express port, but the drive works directly in the sata port ....... I would think that this would be either a bios setting issue or a motherboard issue ???

      I have 4 other systems, which I will run some tests for comparison: R710, 2 x R620, custom freenas build
    • vshaulsk wrote:

      test SSD drive attached to sata II port
      This would be my first test (no need to reinstall anything).

      And to nail problems down with PCIe controllers looking at lspci and lspci -vv is mandatory (maybe there's a x1 electrical slot exposed as x8 mechanical or something like that?). Lspci shows advertised and established link speeds as well as count of usable lanes.

      And some drivers are somewhat verbose so watching dmesg output at boot when PCIe link training happens could be interesting too.
    • Ok, I might have an idea here.

      You got 3D TLC based SSDs here and they have a certain amount of "SLC" cache. Once you exceed that cache, performance drops off a cliff.
      The graph below is of the 850 EVO, but the concept is the same.
      Source: anandtech.com/show/8747/samsung-ssd-850-evo-review/2

      This is a problem with most consumer SSDs today, although as you can see, it affects smaller drives worse than larger drives, as they have a smaller SLC cache.
      OMV 4.x, Gigabyte Z270N-WiFi, i7-6700K@3GHz, 16GB DDR4-3000, 4x 4TB Toshiba N300, 1x 60GB Corsair GT SSD (OS drive), 10Gbps Aquantia Ethernet
    • TheLostSwede wrote:

      This is a problem with most consumer SSDs today
      With some of them. But not with the Samsung 850 Pro with at least 250 GB (quoting myself ;) ). This drive remains above 500 MB/s regardless how much you write to it.

      The most crappy SSDs I have (Intel 540) get as slow as 60MB/s after writing some GB to them but even then a RAID-0 out of 8 such SSD has to show +450 MB/s.
    • I just VPNed into my systems and ran a different test:

      transferring large 100 GB .mkv file over the network, while watching IOSTAT 2

      I wanted to see what happens to the drives/arrays during this large network transfer

      HDD 12x4TB raid 50: I saw write speeds of around 760 MB/s for the array

      SSD 8x250gb raid 0: I still only saw a maximum of around 230 - 250 MB/s
      - same if I transferred 10GB of very small files ... still 230 MB/s max

      Watching IOSTAT confirms what I see when just looking at network transfers. Transfers to my HDD array go really fast no matter how large the file size is, while transfers to the SSD array are quick for a few seconds until network cache is full and the data starts dumping to disk at which it slows down to the 200 MB/s as that is what the SSD array is actually writing at.

      Now I totally don't understand whats happening.... the IOZONE test would say my storage in total is really slow, but watching IOSTAT is telling me that the HDD array is capable of fast writes, while the SSD is showing slow writes
    • I originally was running a raid5, raid50, raid10 and individually for testing

      With my SSD array the performance has always seemed really strange (raid10 & raid5) since the beginning of the year, but now I started really looking into it.

      I would like to ultimately use the SSD array for centralized VM storage over NFS or ISCSI, so that I can play around with high availability on the proxmox nodes.
    • An update regarding my SSD testing / problem

      I installed a pair of intel DC S3610 400gb units and they functioned as expected, whether being tested individually or together in raid 0

      I then formatted and trimmed the Samsung 850 Pro SSD's using windows; Reinstalled a couple of them into OMV and they function properly.
      - It is if they needed to be trimmed, which I thought was being done through linux already.


      Maybe this is the difference between the enterprise grade SSD vs standard ......

      - I was going to use the 850 Pro's in OVM as central shared storage over ISCSI/NFS for 3 node proxmox cluster.
      - Now instead of the 850 pro's, I think I am going to use my enterprise drives intel DC S3610 400 GB ....... they are probably better for that workload anyway.

      - Use the 850 Pro's as local storage within one of the Dell R620's - over provision the disks

      The post was edited 1 time, last by vshaulsk ().