Raid 5, 3 HDD's, clean degraded

geaves · 29. Dezember 2021

That shows the array as unmounted, if you select it and then click mount on the menu does it mount

Darcu · 29. Dezember 2021

no,

if i select it I can't mount it

but I didn't reboot. Should I?

edit: If I try:

Code

root@MEDIASERVER:~# mount /dev/md127
mount: /dev/md127: can't find in /etc/fstab.

geaves · 29. Dezember 2021

Zitat von Darcu

but I didn't reboot. Should I

TBH I'm at a loss as to what is going on, so reboot might resolve it, there are options that can be run to correct fstab and the entry mdadm.conf

Darcu · 29. Dezember 2021

after reboot:

the good news: the Raid is shown again. (it was mounted automatically)

the bad news: when I try to recover /dsv/sdd via WEBGUI it is the same error than in post #1 --> Failed to write metadata

geaves · 29. Dezember 2021

Zitat von Darcu

the bad news: when I try to recover /dsv/sdd via WEBGUI it is the same error than in post #1 --> Failed to write metadata

What's the output of cat /proc/mdstat

Darcu · 29. Dezember 2021

Code

root@MEDIASERVER:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [ra                         id10]
md127 : active raid5 sda[0] sdb[1]
      27344500736 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      bitmap: 28/102 pages [112KB], 65536KB chunk

unused devices: <none>

geaves · 29. Dezember 2021

That's active, that should allow you to add that other drive, but does it show in blkid

Darcu · 29. Dezember 2021

no

Code

root@MEDIASERVER:~# blkid

/dev/sdc1: UUID="4A9E-C1A4" TYPE="vfat" PARTUUID="edd760ec-6ae2-49a6-9057-4bac06cfb8fc"

/dev/sdc2: UUID="7951e886-abd2-4a07-b708-e3bab6e33a8f" TYPE="ext4" PARTUUID="0984dc8b-69b0-4b80-aace-860f4418e579"

/dev/sdc3: UUID="b988c23d-7d64-42d5-9eee-7b63be103b2e" TYPE="swap" PARTUUID="b224a268-4fb6-4511-b89c-556bdf4c8d29"

/dev/sdb: UUID="d1b4a55e-a6b8-12ae-41fe-b7af1e41a9ca" UUID_SUB="1dd37406-fd44-9c3e-b179-316da379a6ca" LABEL="MediaServer:RAID" TYPE="linux_raid_member"

/dev/md127: LABEL="RAID5" UUID="6eb513cb-8151-4ec9-8f71-b3e518b31255" TYPE="ext4"

/dev/sda: UUID="d1b4a55e-a6b8-12ae-41fe-b7af1e41a9ca" UUID_SUB="fc59dfda-4bd3-4f3d-4b60-9fe2cf1572dc" LABEL="MediaServer:RAID" TYPE="linux_raid_member"

Alles anzeigen

Should I try to change SATA cable? Or switch SATA-ports on my Motherboard (I have only 4 SATA for my four drives).

geaves · 29. Dezember 2021

Zitat von Darcu

Should I try to change SATA cable? Or switch SATA-ports on my Motherboard

It's either the cable or the port, I take it doesn't show in the GUI under Storage -> Disks

Darcu · 29. Dezember 2021

under storage --> disks i see all disks.

in S.M.A.R.T:

i have to go now. Tomorrow i will change SATA cable.

Thanks for the help so far.

Darcu · 30. Dezember 2021

i changed all three SATA-cables for the HDD's but no difference. Still the same error when I try to recover /dev/sdd in the WEBGUI.

Maybe give it a try to add the disk with the SSH. Is it:

Code

mdadm --add /dev/md127 /dev/sdd

?

Next step would be to change SATA ports on the mainboard, but now I am struggeling:

For example

sda --> SATA-Port 1 (works acutally -->OS)

sdb --> SATA-Port 2 (works acutally)

sdc --> SATA-Port 3 (works acutally)

sdd --> SATA-Port 4 (works not)

e. g. I would try to change sdb to SATA-Port 4 and sdd to SATA-Port 2:

Maybe SATA-Port 4 is broken and OMV does not recognize sdb but sdd on SATA-Port 2: So sdc is the only disk with is in the RAID, but OMV tries to connect sdd automatical to the RAID, but my data structure on the RAID has changed the last weeks only using sdb & sdc.

I hope you know what I mean?

Is it dangerous. My RAID is not a backup. If I loose the RAID, I will loose a lot of data

Another option: Buy a new motherboard?

Regards

geaves · 30. Dezember 2021

Zitat von Darcu

Is it dangerous. My RAID is not a backup. If I loose the RAID, I will loose a lot of data

Yes you will

OK, the issue could be related to one of the sata ports so you need to test them, but why are you booting from a hard drive when a usb flash drive is good enough, which is what I use.

Disconnect the data drives (your 3 raid drives) at least then the data should be Ok, use the OMV OS drive to test each port, it will take a little time because OMV will try to locate the raid but it should still boot. If it is Sata Port 4 then you have 3 options, a) a new m'board b) run omv from a usb flash drive, (that would require 2 or 3 one working and one as a backup) I use 3 in rotation, c) PCIE sata card and run the raid drives off that

Option2 is the cheapest, but it means your board would only have 3 workable sata ports, option3 would not be too expensive but you need to be careful what you buy, we've had user's with cheap chinese cards.

Darcu · 30. Dezember 2021

strange.

I test all ports using only the omv OS drive. Every Port works. I reach the WEBGUI and did a apt-get update & apt-get upgrade via SSH. Works fine with every PORT

Before I tested the PORT's, I did a S.M.A.R.T. test with sdd:

Code

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.10.0-0.bpo.9-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD140EFFX-68VBXN0
Serial Number:    9RHZARBC
LU WWN Device Id: 5 000cca 264dbe2e2
Firmware Version: 81.00A81
User Capacity:    14,000,519,643,136 bytes [14.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Dec 30 11:23:39 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     127 (intermediate level with standby)
Rd look-ahead is: Enabled
Write cache is:   Disabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Write SCT (Get) Feature Control Command failed: scsi error medium or hardware error (serious)
Wt Cache Reorder: Unknown (SCT Feature Control command failed)

Read SMART Thresholds failed: scsi error medium or hardware error (serious)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (  101) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (1436) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   ---    -    0
  2 Throughput_Performance  --S---   135   135   ---    -    108
  3 Spin_Up_Time            POS---   092   092   ---    -    107 (Average 220)
  4 Start_Stop_Count        -O--C-   099   099   ---    -    403
  5 Reallocated_Sector_Ct   PO--CK   100   100   ---    -    0
  7 Seek_Error_Rate         -O-R--   100   100   ---    -    0
  8 Seek_Time_Performance   --S---   133   133   ---    -    18
  9 Power_On_Hours          -O--C-   100   100   ---    -    5070
 10 Spin_Retry_Count        -O--C-   100   100   ---    -    0
 12 Power_Cycle_Count       -O--CK   095   095   ---    -    360
 22 Unknown_Attribute       PO---K   100   100   ---    -    100
192 Power-Off_Retract_Count -O--CK   097   097   ---    -    21770
193 Load_Cycle_Count        -O--C-   097   097   ---    -    21770
194 Temperature_Celsius     -O----   044   044   ---    -    37 (Min/Max 18/46)
196 Reallocated_Event_Count -O--CK   100   100   ---    -    0
197 Current_Pending_Sector  -O---K   100   100   ---    -    0
198 Offline_Uncorrectable   ---R--   100   100   ---    -    0
199 UDMA_CRC_Error_Count    -O-R--   100   100   ---    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

Read SMART Log Directory failed: scsi error medium or hardware error (serious)

General Purpose Log Directory Version 1
Address    Access  R/W   Size  Description
0x00       GPL     R/O      1  Log Directory
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x0c       GPL     R/O   5501  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x2f       GPL     -        1  Set Sector Configuration
0x30       GPL     R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL     R/W     16  Host vendor specific log
0xe0       GPL     R/W      1  SCT Command/Status
0xe1       GPL     R/W      1  SCT Data Transfer

ATA_READ_LOG_EXT (addr=0x03:0x00, page=0, n=1) failed: scsi error medium or hardware error (serious)
Read SMART Extended Comprehensive Error Log failed

Read SMART Error Log failed: scsi error medium or hardware error (serious)

ATA_READ_LOG_EXT (addr=0x07:0x00, page=0, n=1) failed: scsi error medium or hardware error (serious)
Read SMART Extended Self-test Log failed

Read SMART Self-test Log failed: scsi error medium or hardware error (serious)

Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious)

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   0
Device State:                        Active (0)
Current Temperature:                    37 Celsius
Power Cycle Min/Max Temperature:     27/37 Celsius
Lifetime    Min/Max Temperature:     18/46 Celsius
Under/Over Temperature Limit Count:   0/0
SMART Status:                        0xc24f (PASSED)
Minimum supported ERC Time Limit:    70 (7.0 seconds)

Write SCT Data Table failed: scsi error medium or hardware error (serious)
Read SCT Temperature History failed

Write SCT (Get) Error Recovery Control Command failed: scsi error medium or hardware error (serious)
SCT (Get) Error Recovery Control command failed

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) supported [please try: '-l defects']

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            5  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            6  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Command "Execute SMART Short self-test routine immediately in off-line mode" failed: scsi error medium or hardware error (serious)

Alles anzeigen

If the Ports are okay, it has to be the disk?

Darcu · 30. Dezember 2021

I enclose the system log from rebooting the system

20211230 omv-log.txt

edit:

I could extend all four disks and put it in my DESKTOP-PC. It will be a lot of work, but maybe it helps to go on

geaves · 30. Dezember 2021

Zitat von Darcu

If the Ports are okay, it has to be the disk, Works fine with every PORT

The drive appears to be OK, 5, 10, 196, 197, 198 are not returning any error counts.

So if the ports are OK the drive /dev/sdd is seen in Storage -> Disks, but not in blkid my only suggestion then would be to wipe that drive then add it to the array, BUT be VERY VERY CAREFUL ensure you are selecting the right drive to wipe, if you wipe one of the others your data is toast!!!

Look at your post 31 and screenshots of storage -> disks this will show the serial number for each drive, if you have to, write down the serial number and the port it's connected to, check and double check, satisfy yourself what you have written is correct, then check when it's booted up.

I can't help you recover a toasted array

Darcu · 30. Dezember 2021

hey,

I am not sure if I understand you right:

I have written down my SERIAL-No. from the WEBGUI. I search for the SERIAL-No of my "fault-disk" sdd physically and disconnect this drive from SATA-Port and energy?

What to do next? I am not sure what you mean with wipe the drive. Formatting? Formatting with my DESKTOP PC and after format back to the NAS and add it to array?

geaves · 30. Dezember 2021

No you wipe the drive in OMV, if you're satisfied you have the correct one Storage -> Disks select the drive and click wipe on the menu, you can select short, do nothing until the drive has finished, then add it to the array as you attempted before using recover

Darcu · 30. Dezember 2021

okay, I think I thought too complicated with SERIAL NO.

So it is:

because:

correct?

I am not sure why the SERIAL-No is important for your suggested step. Sorry, I am a bit nervous.

geaves · 30. Dezember 2021

Zitat von Darcu

I am not sure why the SERIAL-No is important for your suggested step. Sorry, I am a bit nervous

I assumed you had the drives disconnected

But going back to your post 32, you could try that command from the cli (ssh) and post any errors, if there are no errors then run cat /proc/mdstat it just might show the raid rebuilding

Darcu · 30. Dezember 2021

Code

root@MEDIASERVER:~# mdadm --add /dev/md127 /dev/sdd
mdadm: Failed to write metadata to /dev/sdd

Zitat von geaves

I assumed you had the drives disconnected

no all four drives's are connected again.

So go on with:

wiping /dev/sdd like posted in #38 ??

Raid 5, 3 HDD's, clean degraded

Jetzt mitmachen!

Ähnliche Themen

6x 3TB RAID6 - one disk dying, what to do?

Replacing disk in Raid1 [OMV6] - entry not visible

Mirror RAID clean, degraded

Mirror RAID clean, degraded