Hi!
Here are the specs of my nas build:
Motherboard: ASRock J3455B-ITX (Bios version P1.20)
RAM: 8GB DDR3 Synchronous 1600MHz (0.6 ns) (2 x 4GB)
CPU: Celeron J3455 (Quadcore, 1.5GHz)
System Drive: Samsung SSD 830 (64GB)
Expansion Slot: DeLock PCIe x2 SATA controller (10 SATA Ports) (https://www.delock.de/produkte…kmale.html?setLanguage=en)
- 2 x 2TB SSD (SanDisk SDSSDA-2): RAID 1
- 3 x 5TB HDD (TOSHIBA HDWE150): RAID 5
- 3 x 8TB HDD (TOSHIBA HDWG180): RAID 5
OS: Debian 9.13
OMV-Version: 4.1.36-1
See full system-report attached.
Currently both RAID 5 RAIDs are degraded running with only 2 out of 3 disks:
root@nas:~# mdadm --detail /dev/md350
/dev/md350:
Version : 1.2
Creation Time : Tue Feb 20 08:55:48 2018
Raid Level : raid5
Array Size : 9767276544 (9314.80 GiB 10001.69 GB)
Used Dev Size : 4883638272 (4657.40 GiB 5000.85 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Dec 7 19:58:21 2020
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : nas:media (local to host nas)
UUID : 2e70a9d5:6d4935f5:943f514f:21e3bc56
Events : 43285
Number Major Minor RaidDevice State
0 8 65 0 active sync /dev/sde1
1 8 97 1 active sync /dev/sdg1
- 0 0 2 removed
Display More
root@nas:~# mdadm --detail /dev/md380
/dev/md380:
Version : 1.2
Creation Time : Mon Dec 7 09:18:46 2020
Raid Level : raid5
Array Size : 15627788288 (14903.82 GiB 16002.86 GB)
Used Dev Size : 7813894144 (7451.91 GiB 8001.43 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Dec 7 10:08:39 2020
State : active, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : nas:media8 (local to host nas)
UUID : 9c2b03e7:6a6d2bf9:09a5d137:f91e1915
Events : 16
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 81 1 active sync /dev/sdf1
- 0 0 2 removed
Display More
When writing large amounts of data to one of the RAID 5 devices I experience a huge IO/wait load, load average goes up somewhere between 5 and 10 (see output of top below). Copying large files from one RAID 5 to the other RAID 5 is very slow - about 25 MB/s.
top - 19:56:04 up 10:08, 3 users, load average: 4.00, 4.74, 4.66
Tasks: 182 total, 1 running, 117 sleeping, 0 stopped, 4 zombie
%Cpu(s): 0.1 us, 0.4 sy, 0.0 ni, 50.0 id, 49.5 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7820252 total, 144056 free, 519720 used, 7156476 buff/cache
KiB Swap: 7811068 total, 7811068 free, 0 used. 6961240 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20036 root 20 0 38604 3560 2948 R 0.7 0.0 0:00.09 top
11706 root 20 0 965144 8404 3168 S 0.3 0.1 0:34.68 collectd
18063 root 20 0 38740 3644 2892 S 0.3 0.0 0:17.90 top
1 root 20 0 205324 6952 4556 S 0.0 0.1 0:05.61 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kb
9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
10 root 20 0 0 0 0 S 0.0 0.0 0:00.45 ksoftirqd/0
...
Display More
dmesg shows repeating errors:
[37013.334123] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[37013.338796] ata3.00: failed command: WRITE DMA EXT
[37013.340925] ata3.00: cmd 35/00:40:80:d2:35/00:05:28:00:00/e0 tag 7 dma 688128 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[37013.345242] ata3.00: status: { DRDY }
[37013.346315] ata3: hard resetting link
[37013.661354] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[37013.664382] ata3.00: configured for UDMA/33
[37013.664392] ata3.00: device reported invalid CHS sector 0
[37013.664405] ata3: EH complete
I don't know if these errors are because of some bottleneck or if it is a hardware error.
Can someone point me to where the bottleneck most likely is? Is it
- the CPU (computing the parity data)
- the SATA controller card
- the HDDs themself
- something else?
Thanks for your advice,
Mike