Hello community
Since there was no reply to my German post, I thought it'd maybe help if my issue is being written in English as well (I'm really struggling)
I have a problem with my RAID5 and really can't get any further, which is why I am contacting you.
I looked for posts to solve the problem here as well as in other forums, however I did not conclude from the information gained, resp. I wasn't sure if that was the solution to my problem.
By the way - for other problems / topics I have researched here in the forum several times and could actually always find a solution for everything (until now)
Information:
OMV Version: 5.4.7-1 (Usul)
Note: The system, including RAID, was permanently stable. Ie. no dropouts, ERRORS, etc.
Problem:
Using the web UI, I wanted to expand my RAID5 (4x4TB) with another 4TB HDD to 5x4TB.
Everything went well until reshaping, when the process got stuck at 71.3% and from then on I only got the "blk_update_request: I / O error" error.
root@NAS:~# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid1 sdh[1] sdg[0]
976630464 blocks super 1.2 [2/2] [UU]
bitmap: 0/8 pages [0KB], 65536KB chunk
md0 : active raid5 sdb[4] sde[2] sdf[6] sdc[3] sdd[5]
11720658432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
[==============>......] reshape = 71.3% (2788598272/3906886144) finish=1140420652.8min speed=0K/sec
bitmap: 2/8 pages [8KB], 262144KB chunk
Short excerpt:
root@NAS:~# dmesg | grep sdc
[ 649.262140] sd 1:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
[ 649.262144] sd 1:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 bb 40 67 30 00 00 00 08 00 00
[ 649.262148] blk_update_request: I/O error, dev sdc, sector 7436527408 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 0
[ 652.616293] sd 1:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
[ 652.616298] sd 1:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
[ 652.616300] sd 1:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
[ 652.616304] sd 1:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 bb 40 67 38 00 00 00 08 00 00
[ 652.616308] blk_update_request: I/O error, dev sdc, sector 7436527416 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 0
[ 655.974769] sd 1:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
[ 655.974773] sd 1:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
[ 655.974776] sd 1:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
[ 655.974779] sd 1:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 bb 40 67 40 00 00 00 08 00 00
What I've already checked:
- All connection cables, and replacing them with new ones - no success, this wasn't the issue
- Whether the plate is running (oscillations, vibrations) - the plate is running, so that's fine
In the RAID management of the web UI, the RAID is no longer displayed to me - there is a pop-up occuring with the error message "communication failure".
The strange thing though is, that not even the RAID1 (md1) is displayed in the web UI RAID management...
The SMART test for /sdc gave the following result:
root@NAS:~# smartctl -a /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.5.0-0.bpo.2-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E4ZRU9P1
LU WWN Device Id: 5 0014ee 2627f957d
Firmware Version: 82.00A82
User Capacity: 4'000'787'030'016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jul 27 18:03:10 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (53520) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 535) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 195 195 051 Pre-fail Always - 7632
3 Spin_Up_Time 0x0027 191 179 021 Pre-fail Always - 7433
4 Start_Stop_Count 0x0032 086 086 000 Old_age Always - 14027
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 063 063 000 Old_age Always - 27580
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 236
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 197
193 Load_Cycle_Count 0x0032 173 173 000 Old_age Always - 83996
194 Temperature_Celsius 0x0022 120 110 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 198 198 000 Old_age Always - 1791
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 27448 757565049
# 2 Short offline Completed: read failure 90% 27448 757532490
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Alles anzeigen
Now the question arises whether the /sdc is broken or something can still be done in order to have the reshape process being completed.
If it is broken and because the /sdc is "in the middle" of the reshape process, can the HDD be replaced by another disk (I already have a brand new next to me) and continued at the same reshape point (73.1%)?
What else information would you need from me in order to help me out?
I thank you very much in advance for your help!