Where is the bottleneck?

mfischer · 7. Dezember 2020

Hi!

Here are the specs of my nas build:

Motherboard: ASRock J3455B-ITX (Bios version P1.20)

RAM: 8GB DDR3 Synchronous 1600MHz (0.6 ns) (2 x 4GB)

CPU: Celeron J3455 (Quadcore, 1.5GHz)

System Drive: Samsung SSD 830 (64GB)

Expansion Slot: DeLock PCIe x2 SATA controller (10 SATA Ports) (https://www.delock.de/produkte…kmale.html?setLanguage=en)

- 2 x 2TB SSD (SanDisk SDSSDA-2): RAID 1

- 3 x 5TB HDD (TOSHIBA HDWE150): RAID 5

- 3 x 8TB HDD (TOSHIBA HDWG180): RAID 5

OS: Debian 9.13

OMV-Version: 4.1.36-1

See full system-report attached.

Currently both RAID 5 RAIDs are degraded running with only 2 out of 3 disks:

Code

root@nas:~# mdadm --detail /dev/md350 
/dev/md350:
        Version : 1.2
  Creation Time : Tue Feb 20 08:55:48 2018
     Raid Level : raid5
     Array Size : 9767276544 (9314.80 GiB 10001.69 GB)
  Used Dev Size : 4883638272 (4657.40 GiB 5000.85 GB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Dec  7 19:58:21 2020
          State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : nas:media  (local to host nas)
           UUID : 2e70a9d5:6d4935f5:943f514f:21e3bc56
         Events : 43285

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       8       97        1      active sync   /dev/sdg1
       -       0        0        2      removed

Alles anzeigen

Code

root@nas:~# mdadm --detail /dev/md380
/dev/md380:
        Version : 1.2
  Creation Time : Mon Dec  7 09:18:46 2020
     Raid Level : raid5
     Array Size : 15627788288 (14903.82 GiB 16002.86 GB)
  Used Dev Size : 7813894144 (7451.91 GiB 8001.43 GB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Dec  7 10:08:39 2020
          State : active, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : nas:media8  (local to host nas)
           UUID : 9c2b03e7:6a6d2bf9:09a5d137:f91e1915
         Events : 16

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       81        1      active sync   /dev/sdf1
       -       0        0        2      removed

Alles anzeigen

When writing large amounts of data to one of the RAID 5 devices I experience a huge IO/wait load, load average goes up somewhere between 5 and 10 (see output of top below). Copying large files from one RAID 5 to the other RAID 5 is very slow - about 25 MB/s.

Code

top - 19:56:04 up 10:08,  3 users,  load average: 4.00, 4.74, 4.66
Tasks: 182 total,   1 running, 117 sleeping,   0 stopped,   4 zombie
%Cpu(s):  0.1 us,  0.4 sy,  0.0 ni, 50.0 id, 49.5 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7820252 total,   144056 free,   519720 used,  7156476 buff/cache
KiB Swap:  7811068 total,  7811068 free,        0 used.  6961240 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
20036 root      20   0   38604   3560   2948 R   0.7  0.0   0:00.09 top                                              
11706 root      20   0  965144   8404   3168 S   0.3  0.1   0:34.68 collectd                                         
18063 root      20   0   38740   3644   2892 S   0.3  0.0   0:17.90 top                                              
    1 root      20   0  205324   6952   4556 S   0.0  0.1   0:05.61 systemd                                          
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd                                         
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp                                           
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par_gp                                       
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H-kb                                  
    9 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq                                     
   10 root      20   0       0      0      0 S   0.0  0.0   0:00.45 ksoftirqd/0                                      
...

Alles anzeigen

dmesg shows repeating errors:

Code

[37013.334123] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[37013.338796] ata3.00: failed command: WRITE DMA EXT
[37013.340925] ata3.00: cmd 35/00:40:80:d2:35/00:05:28:00:00/e0 tag 7 dma 688128 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)                                  
[37013.345242] ata3.00: status: { DRDY }
[37013.346315] ata3: hard resetting link
[37013.661354] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[37013.664382] ata3.00: configured for UDMA/33
[37013.664392] ata3.00: device reported invalid CHS sector 0
[37013.664405] ata3: EH complete

I don't know if these errors are because of some bottleneck or if it is a hardware error.

Can someone point me to where the bottleneck most likely is? Is it

- the CPU (computing the parity data)

- the SATA controller card

- the HDDs themself

- something else?

Thanks for your advice,

Mike

mi-hol · 20. Dezember 2020

Zitat von mfischer

OMV-Version: 4.1.36-1

is unsupported since some time and OS: Debian 9.13 doesn't look up-to-date either.

I'm not a RAID expert, but 'Currently both RAID 5 RAIDs are degraded running with only 2 out of 3 disks' sounds like the biggest issue to fix first.

Is this a software or HW RAID configuration?

Relevant may this comment https://github.com/geerlingguy…/1#issuecomment-724822718

Where is the bottleneck?

mfischer 7. Dezember 2020

votdev 7. Dezember 2020

Jetzt mitmachen!