Beiträge von LukeR1886

LukeR1886 · 21. Februar 2024

Here is the exact sequence of commands and outputs that put the array back "online" in degraded mode with 3 out of 4 drives. Please skim this thread to see if your failure is the same corrupt superblock journal data type of failure that I had before proceeding. And use these commands at your own risk. I am not an expert. This worked for me. I am not responsible for and data loss that may occur because of the improper (or proper) use of these commands.

Stopped the array:

Code

mdadm --stop /dev/md127

Then created new. It is my understanding that the --assume-clean tag is critical to keep your data intact. And you'll need to type your array in the exact sequence that the mdadm --detail /dev/md127 output showed, including where the "missing drive" is supposed to be in the array.

Code

root@Bern:~# mdadm --create /dev/md0 --assume-clean --level=5 --verbose --raid-devices=4 missing /dev/sdb /dev/sdd /dev/sde
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 512K
mdadm: /dev/sdb appears to be part of a raid array:
       level=raid5 devices=4 ctime=Wed Mar 16 19:00:56 2022
mdadm: /dev/sdd appears to be part of a raid array:
       level=raid5 devices=4 ctime=Wed Mar 16 19:00:56 2022
mdadm: /dev/sde appears to be part of a raid array:
       level=raid5 devices=4 ctime=Wed Mar 16 19:00:56 2022
mdadm: size set to 5860390400K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Alles anzeigen

Then we run the fsck -v /dev/md0 command. You could include a -y after the -verbose(-v) to automatically say yes to all, but I was paranoid at this point so I hit enter for each repair instead.

Code

root@Bern:~# fsck -v /dev/md0
fsck from util-linux 2.36.1
e2fsck 1.46.6 (1-Feb-2023)
Storage: recovering journal
Storage contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry '.recycle' in /TimeMachine (193986561) is a link to directory /lost+found/#193986563 (193986563).
Clear<y>? yes
Entry 'LucasM-b▒M-^Ys iMac.sparsebundle' in /TimeMachine (193986561) is a link to directory /lost+found/#193986567 (193986567).
Clear<y>? yes
Entry '650A7B67-28EC-5652-BB53-DC8F69CCF873_2024-01-11-200103.sparsebundle' in /lost+found/#193986563/lucas (193986564) is a link to directory /lost+found/#193986580 (193986580).
Clear<y>? yes
Entry 'LucasM-b▒M-^Ys iMac.sparsebundle' in /lost+found/#193986563/lucas (193986564) is a link to directory /lost+found/#193986595 (193986595).
Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 193986562 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986565 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986566 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986601 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986602 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986615 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986621 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986627 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986633 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986639 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986645 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986651 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986657 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986663 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986669 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986675 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986681 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986687 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986693 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986699 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986705 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986711 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986717 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986723 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986729 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986735 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986741 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986747 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986753 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986759 ref count is 1, should be 2.  Fix<y>? yes
Inode 193986767 ref count is 1, should be 2.  Fix<y>? yes



Pass 5: Checking group summary information
Free blocks count wrong (1870595508, counted=1870294285).
Fix<y>? yes
Free inodes count wrong (273481934, counted=273481919).
Fix<y>? yes

Storage: ***** FILE SYSTEM WAS MODIFIED *****

     1224513 inodes used (0.45%, out of 274706432)
        8222 non-contiguous files (0.7%)
         160 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 1219489/5006/10
  2524998515 blocks used (57.45%, out of 4395292800)
           0 bad blocks
         704 large files

     1159481 regular files
       65023 directories
           0 character device files
           0 block device files
           0 fifos
          31 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
     1224535 files
root@Bern:~#
root@Bern:~# fsck -v /dev/md0
fsck from util-linux 2.36.1
e2fsck 1.46.6 (1-Feb-2023)
Storage: clean, 1224513/274706432 files, 2524998515/4395292800 blocks

Alles anzeigen

Then we mount the file system with the mount -a

Thanks very much to geaves Krisbee and votdev ! Hopes this helps someone in the future!

LukeR1886 · 21. Februar 2024

CORRECTION :

Obviously the physical correction to the root of the problem was resolved by a higher output PSU. This solved every singe R/W error that was happening to the disks. Terrifying.

I am not a RAID or mdadm expert, but I think this is how the commands that I used repaired the array. The array was stuck in a catch 22 of having a failed/ removed drive, but also unable to add more than the original specified amount of drives. I believe that means the superblock journal data was corrupt. So instead of trying to reassemble the existing array with the existing corrupt superblock journal data, I rebuilt the array with (very carefully selected) commands to tell the new superblock journal entry exactly what was in the "new" array.

It consisted of the drives that were still "up to date" together and entered in the exact order that they were previously listed out in the mdadm --detail /dev/md127 output, plus the "missing drive" in the correct order. The command I used was mdadm --create /dev/md0 --assume-clean --level=5 --verbose --raid-devices=4 missing /dev/sdb /dev/sdd /dev/sde , however, you'll need to assess your array and edit the drives and "missing drive" as it applies to your case scenario.

My understanding is that this command re-created the array superblock journal data, and used the existing filesystem information it found on the drives. And the IMPORTANT part of the command is the --assume-clean tag, which stops the degraded array from trying to sync when it is created (because if it tries to sync, it can potentially overwrite the existing data).

This process completed successfully. Then I ran the fsck -v /dev/md0 command, and the Check Filesystem process completed successfully. And finally I ran mount -a and the volume mounted in degraded mode with 3 out of 4 drives. And the data was now all accessible.

I copied all the data that wasn't in my previous backup to my other Windows Server NTFS array. Then I successfully rebuilt the array with the original 4 drives. For good measure I created a new array with some eBay drives and made a clone copy of the successfully rebuilt array to my new EXT4 array.

All RAID volumes are now happy, and I have a backup array that is unplugged which contains a January 2024 date backup.

LukeR1886 · 21. Februar 2024

CAUSE :

This took me a while to figure out the actual, true cause. And it was not a disk failure (directly). I recently upgraded my motherboard from a SuperMicro unit to a nearly identical SuperMicro unit that has 6x onboard 10GBe ports built in. The existing power supply worked just fine on the old motherboard for 2 years with no issue, but the 6x onboard 10GBe ports (3x 2-port chipsets) took the power supply over its maximum capacity.

TL;DR

The PSU was maxxed out, and the drives were intermittently powering down, then powering back up.

LukeR1886 · 21. Februar 2024

So I just wanted to follow this up with the complete three C's (Concern, Cause, Correction) for anyone that runs into this nightmare.

CONCERN :

The concern was that a fully functional RAID 5 array had a disk that went "out of date". When I physically probed the machines while it was running, It seemed that moving the cables made the drives spin up. But I now realize this was false. Thinking the cables were the cause I shut down the machine and replaced the cables and additionally SAS HBA (with a known good unit) for good measure. Something was still never right, and when OMV booted up and I ran the mdadm --assemble --force --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde command the array just never correctly reassembled.

You can read through the thread to see all of the things that I tried. But what I eventually determined is that the superblock journal must have been corrupt. I continued to get error messages that contradicted the state of the array. The mdadm --detail /dev/md127 command showed that the "out of date" drive was successfully removed, but the mdadm --add /dev/md127 /dev/sdc command resulted in the error mdadm: add new device failed for /dev/sdc as 4: Invalid argument. So the device was already removed, but the replacement device couldn't be added and was stuck in a catch 22.

A short answer is that the link to https://www.linuxquestions.org…ing-a-failed-disk-909577/ had the solution that got the array running.

LukeR1886 · 4. Februar 2024

geaves Krisbee votdev Thank you all so much for bearing with an amateur!

HA! I was able to mount the drive and see all of my data!!!! I am not a file system or RAID expert by any means, so I do not fully understand the commands which I used to make it happen. But the short answer is that I followed the directions of the LinuxQuestions.org link I posted. Then I ran fsck (which successfully completed), then I just typed mount -a like Krisbee said.

I have a replacement array coming from eBay, so at this point in time I am not going to try to rebuild the array until my replacement array is here, And I'm going to use this opportunity to pull the most important files from my degraded array.

I will post the commands which I used exactly to get this array back online, so if someone else runs into the issue of their array with a "missing drive" that cannot be added or removed, they will know what to try to get it back mounted and functional!

I will also need to cruise this forum to learn how to make an rsync job to keep data backed-up automatically on this array and my newly arriving array.

Thanks again for having patience and putting up with my ignorance!

LukeR1886 · 3. Februar 2024

Krisbee and here is the dmesg error log I've generated:

Code

[    0.058098]  #2
[    0.058098]  #3
[    0.061488] pmd_set_huge: Cannot satisfy [mem 0xf0000000-0xf0200000] with a huge-page mapping due to MTRR override.
[    1.085698] piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled!
[    2.513369] md: sdc does not have a valid v1.2 superblock, not importing!
[    2.513373] md: md_import_device returned -22
[    3.508505] md: sdc does not have a valid v1.2 superblock, not importing!
[    3.508508] md: md_import_device returned -22
[    3.720419] Buffer I/O error on dev md127, logical block 2197323776, async page read
[    3.720469] Buffer I/O error on dev md127, logical block 2197323776, async page read
[   87.499125] md: sdc does not have a valid v1.2 superblock, not importing!
[   87.499131] md: md_import_device returned -22
[   87.506757] md: sdc does not have a valid v1.2 superblock, not importing!
[   87.506760] md: md_import_device returned -22
[   87.513628] md: sdc does not have a valid v1.2 superblock, not importing!
[   87.513632] md: md_import_device returned -22
[  104.436393] md: sdc does not have a valid v1.2 superblock, not importing!
[  104.436398] md: md_import_device returned -22
[  104.466416] md: sdc does not have a valid v1.2 superblock, not importing!
[  104.466421] md: md_import_device returned -22
[  104.472972] md: sdc does not have a valid v1.2 superblock, not importing!
[  104.472976] md: md_import_device returned -22
[  111.246361] md: sdc does not have a valid v1.2 superblock, not importing!
[  111.246367] md: md_import_device returned -22
[  111.254189] md: sdc does not have a valid v1.2 superblock, not importing!
[  111.254193] md: md_import_device returned -22
[  111.261792] md: sdc does not have a valid v1.2 superblock, not importing!
[  111.261797] md: md_import_device returned -22
[  414.919384] md: sdc does not have a valid v1.2 superblock, not importing!
[  414.919389] md: md_import_device returned -22
[  415.120298] md: sdc does not have a valid v1.2 superblock, not importing!
[  415.120304] md: md_import_device returned -22
[  415.129160] md: sdc does not have a valid v1.2 superblock, not importing!
[  415.129165] md: md_import_device returned -22
[  697.420871] md: sdc does not have a valid v1.2 superblock, not importing!
[  697.420878] md: md_import_device returned -22
[  697.650885] md: sdc does not have a valid v1.2 superblock, not importing!
[  697.650891] md: md_import_device returned -22
[  697.666991] md: sdc does not have a valid v1.2 superblock, not importing!
[  697.666995] md: md_import_device returned -22
[96572.336068] JBD2: IO error reading journal superblock
[96572.336088] EXT4-fs (md127): error loading journal
[188641.056210] Buffer I/O error on dev md127, logical block 2197323776, async page read
[188641.056267] Buffer I/O error on dev md127, logical block 2197323776, async page read
[209411.500787] Buffer I/O error on dev md127, logical block 2197323776, async page read
[209411.500832] Buffer I/O error on dev md127, logical block 2197323776, async page read
[209419.927066] Buffer I/O error on dev md127, logical block 2197323776, lost async page write
[210434.842420] Buffer I/O error on dev md127, logical block 3103787040, async page read
[210434.842471] Buffer I/O error on dev md127, logical block 3103787040, async page read
[210434.842751] Buffer I/O error on dev md127, logical block 3103787042, async page read
[210434.842789] Buffer I/O error on dev md127, logical block 3103787042, async page read
[210467.263114] Buffer I/O error on dev md127, logical block 2197323904, lost async page write
[210467.263146] Buffer I/O error on dev md127, logical block 2197323776, lost async page write
[210467.651644] Buffer I/O error on dev md127, logical block 2197325211, lost async page write
[210468.941762] Buffer I/O error on dev md127, logical block 2197325467, lost async page write
[210468.941800] Buffer I/O error on dev md127, logical block 2197325339, lost async page write
[210478.317186] Buffer I/O error on dev md127, logical block 3103787040, lost async page write
[210478.317224] Buffer I/O error on dev md127, logical block 3103787042, lost async page write
[210594.943630] Buffer I/O error on dev md127, logical block 2197323776, async page read
[210594.943897] Buffer I/O error on dev md127, logical block 2197323776, async page read
[212418.147668] JBD2: IO error reading journal superblock
[212418.147934] EXT4-fs (md127): error loading journal
[212746.713754] Buffer I/O error on dev md127, logical block 2197323776, async page read
[212746.714033] Buffer I/O error on dev md127, logical block 2197323776, async page read
[212771.620026] Buffer I/O error on dev md127, logical block 2197323776, async page read
[212771.620280] Buffer I/O error on dev md127, logical block 2197323776, async page read
[212795.569117] Buffer I/O error on dev md127, logical block 2197323776, lost async page write
[212824.293455] Buffer I/O error on dev md127, logical block 3103787040, async page read
[212824.293741] Buffer I/O error on dev md127, logical block 3103787040, async page read
[212845.937517] Buffer I/O error on dev md127, logical block 3103787042, async page read
[212845.937975] Buffer I/O error on dev md127, logical block 3103787042, async page read
[212878.432608] Buffer I/O error on dev md127, logical block 3103787040, lost async page write
[212878.432847] Buffer I/O error on dev md127, logical block 3103787042, lost async page write
[212897.668471] Buffer I/O error on dev md127, logical block 3103787040, lost async page write
[212897.668733] Buffer I/O error on dev md127, logical block 3103787042, lost async page write
[213467.291910] Buffer I/O error on dev md127, logical block 2197323776, async page read
[213467.292154] Buffer I/O error on dev md127, logical block 2197323776, async page read
[213499.558661] Buffer I/O error on dev md127, logical block 2197323776, lost async page write
[213528.188041] Buffer I/O error on dev md127, logical block 3103787040, async page read
[213528.188256] Buffer I/O error on dev md127, logical block 3103787040, async page read

Alles anzeigen

LukeR1886 · 3. Februar 2024

Zitat von Krisbee

Run the command as fsck -yv /dev/md127 and let fsck do its thing. Hopefully not all superblocks are damaged.

Krisbee So I've been busy with work, but here is part of the output from when I ran fsck. I'm no file system expert, but it doesn't look very successful. Wondering if I should run it again... ? I think the entire output is TOO long to post (more than 10,000 characters), but I can always upload *.txt files with the entire dump... but the important parts are the beginning and end. So...

Beginning:

Code

root@openmediavault:~# fsck -yv /dev/md127
fsck from util-linux 2.36.1
e2fsck 1.46.6 (1-Feb-2023)
Error reading block 2197323776 (Input/output error).  Ignore error? yes

Force rewrite? yes

Superblock has an invalid journal (inode 8).
Clear? yes

*** journal has been deleted ***

Superblock has_journal flag is clear, but a journal is present.
Clear? yes

Pass 1: Checking inodes, blocks, and sizes
Journal inode is not in use, but contains data.  Clear? yes

Inode 149717212 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 149717222 extent tree (at level 1) could be narrower.  Optimize? yes

Inode 149717224 extent tree (at level 1) could be narrower.  Optimize? yes

Inode 239339468 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 239339485 extent tree (at level 1) could be narrower.  Optimize? yes

Inode 239339487 extent tree (at level 1) could be narrower.  Optimize? yes

Inode 239339494 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 239339495 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 239370747 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 239403469 extent tree (at level 1) could be narrower.  Optimize? yes

Inode 239436368 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 239436370 extent tree (at level 2) could be narrower.  Optimize? yes

Inode 239436377 extent tree (at level 2) could be narrower.  Optimize? yes

Alles anzeigen

Some different repairs in the middle:

Code

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Error reading block 3103787040 (Input/output error) while reading directory block.  Ignore error? yes

Force rewrite? yes

Directory inode 193986561, block #0, offset 0: directory corrupted
Salvage? yes

Missing '.' in directory inode 193986561.
Fix? yes

Missing '..' in directory inode 193986561.
Fix? yes

Error reading block 3103787042 (Input/output error) while reading directory block.  Ignore error? yes

Force rewrite? yes

Directory inode 193986564, block #0, offset 0: directory corrupted
Salvage? yes

Missing '.' in directory inode 193986564.
Fix? yes

Missing '..' in directory inode 193986564.
Fix? yes

Pass 3: Checking directory connectivity
'..' in /TimeMachine (193986561) is <The NULL inode> (0), should be / (2).
Fix? yes

Unconnected directory inode 193986563 (was in /TimeMachine)
Connect to /lost+found? yes

'..' in /lost+found/#193986563/lucas (193986564) is <The NULL inode> (0), should be /lost+found/#193986563 (193986563).
Fix? yes

Unconnected directory inode 193986567 (was in /TimeMachine)
Connect to /lost+found? yes

Unconnected directory inode 193986580 (was in /lost+found/#193986563/lucas)
Connect to /lost+found? yes

Unconnected directory inode 193986595 (was in /lost+found/#193986563/lucas)
Connect to /lost+found? yes

Pass 4: Checking reference counts
Inode 2 ref count is 16, should be 17.  Fix? yes

Unattached inode 193986562
Connect to /lost+found? yes

Inode 193986562 ref count is 2, should be 1.  Fix? yes

Inode 193986563 ref count is 5, should be 3.  Fix? yes

Alles anzeigen

Ending:

Code

Free blocks count wrong for group #67058 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #67059 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #67060 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #67061 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #67062 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #67063 (0, counted=32768).
Fix? yes

Free blocks count wrong for group #67064 (0, counted=32768).
Fix? yes

Free blocks count wrong (1870595927, counted=1870556430).
Fix? yes

Free inodes count wrong (273481934, counted=273481919).
Fix? yes

Recreate journal? yes

Creating journal (262144 blocks):  Done.

*** journal has been regenerated ***
Error writing file system info: Input/output error

Storage: ***** FILE SYSTEM WAS MODIFIED *****

Alles anzeigen

LukeR1886 · 2. Februar 2024

Krisbee Do I go ahead and hit yes to all of THESE?

Code

fsck /dev/md127
fsck from util-linux 2.36.1
e2fsck 1.46.6 (1-Feb-2023)
Error reading block 2197323776 (Input/output error).  Ignore error<y>? yes
Force rewrite<y>? no
Superblock has an invalid journal (inode 8).
Clear<y>? no
Journal superblock is corrupt.
Fix<y>? no
fsck.ext4: The journal superblock is corrupt while checking journal for Storage
e2fsck: Cannot proceed with file system check

Storage: ********** WARNING: Filesystem still has errors **********

Alles anzeigen

LukeR1886 · 2. Februar 2024

and the other two

Code

SDD

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.16.0-0.bpo.4-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST6000NM0014
Revision:             K005
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5008533ed2f
Serial number:        Z4D49F6M0000R631Q4W5
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Jan 31 20:15:25 2024 PST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     29 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 53332:45
Manufactured in week ÿÿ of year 20ÿÿ
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  249
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2486
Elements in grown defect list: 0

Vendor (Seagate Cache) information  Blocks sent to initiator = 173361748  Blocks received from initiator = 100770296  Blocks read from cache and sent to initiator = 13343  Number of read and write commands whose size <= segment size = 43743  Number of read and write commands whose size > segment size = 1228

Vendor (Seagate/Hitachi) factory information  number of hours powered up = 53332.75  number of minutes until next internal SMART test = 1

Error counter log:           Errors Corrected by           Total   Correction     Gigabytes    Total               ECC          rereads/    errors   algorithm      processed    uncorrected           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   346719994        0         0  346719994          0        710.090           0
write:         0        0        15        15         15      18008.564           0

Non-medium error count:        2


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]     Description                              number   (hours)
# 1  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self-test duration: 35934 seconds [598.9 minutes]

Alles anzeigen

Code

SDE

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.16.0-0.bpo.4-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST6000NM0014
Revision:             K005
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5008533f45b
Serial number:        Z4D49F1S0000R631SFPL
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Jan 31 20:15:34 2024 PST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     28 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 53331:57
Manufactured in week ÿÿ of year 20ÿÿ
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  236
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2468
Elements in grown defect list: 0

Vendor (Seagate Cache) information  Blocks sent to initiator = 173354728  Blocks received from initiator = 101444351  Blocks read from cache and sent to initiator = 14688  Number of read and write commands whose size <= segment size = 109417  Number of read and write commands whose size > segment size = 5490

Vendor (Seagate/Hitachi) factory information  number of hours powered up = 53331.95  number of minutes until next internal SMART test = 58

Error counter log:           Errors Corrected by           Total   Correction     Gigabytes    Total               ECC          rereads/    errors   algorithm      processed    uncorrected           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   346709372        0         0  346709372          0        710.061           0
write:         0        0        26        26         26      18016.624           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]     Description                              number   (hours)
# 1  Background short  Aborted (device reset ?)    -   53274                 - [-   -    -]

Long (extended) Self-test duration: 36542 seconds [609.0 minutes]

Alles anzeigen

LukeR1886 · 2. Februar 2024

Krisbee Awesome. I did order some eBay 6TB drives to decide whether I'm going to copy each drive and try to repair on copies, or if I'm just gonna bite the bullet and run the fsck on the /dev/md127. I'll report beck with my decision and findings.

2 more questions if you'd be so kind, but no pressure to answer. I appreciate all of the assistance you provided so far VERY much!

Should I use the original VM to run the fsck on... will the OS container that originally created the array and file system have any data that would make the mounting/ repair chances better? Or should I NOT power the array on/off anymore and just run the current OS I'm been operating on?
This might be a big ask, but from what I can tell, these smartctl outputs suggest that I have 4 healthy (high hour) drives, eh? (Which should hopefully have the potential to survive the stress of a fsck and array repair?)

EDIT - The message is too long... sorry I'll split it up over several.

Code

SDB

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.16.0-0.bpo.4-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST6000NM0014
Revision:             K005
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5008502acc7
Serial number:        Z4D3Z5400000R629ME6X
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Jan 31 20:15:17 2024 PST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned = 0
Power on minutes since format <not available>
Current Drive Temperature:     28 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 53332:02
Manufactured in week 16 of year 2016
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  336
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2552
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 173334029
  Blocks received from initiator = 113190786
  Blocks read from cache and sent to initiator = 44445
  Number of read and write commands whose size <= segment size = 936131
  Number of read and write commands whose size > segment size = 19664

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 53332.03
  number of minutes until next internal SMART test = 59

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   346668054        0         0  346668054          0        709.976           0
write:         0        0       215       215        215      18165.860           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self-test duration: 37374 seconds [622.9 minutes]

Alles anzeigen

Code

SDC

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.16.0-0.bpo.4-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST6000NM0014
Revision:             K005
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5008533713f
Serial number:        Z4D492S60000R5332EWJ
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Jan 31 20:14:58 2024 PST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     29 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 53332:03
Manufactured in week ÿÿ of year 20ÿÿ
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  172
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2415
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 173310452
  Blocks received from initiator = 2036484735
  Blocks read from cache and sent to initiator = 41082
  Number of read and write commands whose size <= segment size = 79533
  Number of read and write commands whose size > segment size = 5873

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 53332.05
  number of minutes until next internal SMART test = 5

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   346662346        0         0  346662346          0        709.880           0
write:         0        0        14        14         14      25939.786           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged

Alles anzeigen

LukeR1886 · 1. Februar 2024

Krisbee Thank you for taking the time to reply. The SAS HBA is in full passthrough mode. So the VM container can access the HBA and drives directly. I can always switch back to the original VM in less than 2 minutes flat if you think there may be critical data regarding the array saved in that OS container.

I have posted this problem in another forum, and gotten a few ideas. But thus far, I have only followed the guidance received in this thread. An amazing suggestion I received was to raw-copy each drive to another drive, and use the copies to attempt to repair and remount the file system. The original drives are very high mileage, so I have some worries about successfully raw-copying the data without the original drives failing. They've been healthy for a very long time, but they're passed the expected lifetime hours by A LOT! I'd love to hear some input about using this method.

And thank you for the digitalocean and loggly links, I'll check them out when I'm home from work!

LukeR1886 · 1. Februar 2024

Krisbee I'll do my best to answer as concisely as possibly, to avoid this thread getting long and cumbersome.

This is my home NAS, and luckily not super-sensitive data. Unfortunately I don't have an automated RSYNC to another server, so the last time this volume was backed up was about one year ago, and a few small backups have been made. I'm not missing a TON of data since the last backup, but it would be a bummer to lose it. Plus it will be a great learning experience if I can recover it or even if not.
We'll skip to your #3 because your #2 and #4 have related info. I've attempted to mount the filesystem in the WebUI. I cannot find the kernel log in the System Log page you mention, and may need help locating exactly what you're looking for in the CLI.
Response to your #2 and #4. The MD RAID appears in the WebUI as clean, degraded. The filesystem is a bit of an explanation. It is not listed at all in the WebUI, because I transferred the array to a new OMV VM. This was for a few reasons. I had to shut down the metal-box machine to replace the SAS card and cabling. I suspect the root failure was the cabling, because when I touched cabled when it was powered up, I heard the drives turn on... ... They then powered off shortly after and I knew I couldn't have them keep dong that. so I powered it off, replaced the SAS card and cable, and powered it up. The array did not reassemble on it's own, and there were LOTS of error messages clouding everything, many just related to mounts that were unavailable. So I spun up a new VM on the same version of OMV and have been attempting to recover in a fresh install. The original VM container is still existent, I just haven't used it since the first try after the SAS card and cable replacement. Long story long, that VM WebUI says the array is missing.
Response to your #5. I used the command mdadm --stop /dev/md127 before trying to reassemble, first with mdadm --assemble --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde then with mdadm --assemble --force --verbose /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde .

Sorry for the long response, but just trying to answer your questions fully.
Here is the current fstb info:

Code

FSTAB INFO
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# systemd generates mount units based on this file, see systemd.mount(5).
# Please run 'systemctl daemon-reload' after making changes here.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda1 during installation
UUID=43b1e19a-3a73-477f-b758-db5b833bf8af /               ext4    errors=remount-ro 0       1
# swap was on /dev/sda5 during installation
UUID=02abfba3-4cd8-40a1-b479-75598af04eb7 none            swap    sw              0       0
/dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
# >>> [openmediavault]
/dev/disk/by-uuid/fc4c074a-e2d7-46ee-82bd-2983d057c941        /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941    ext4    defaults,nofail,user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl    0 2
# <<< [openmediavault]

Alles anzeigen

LukeR1886 · 31. Januar 2024

Krisbee Thank you for the reassurance. We'll see how this goes. Do you think it would be wise to run fsck at this time, so it can run while I'm at work? Or shall we just wait for further approval before doing anything of the sort?

votdev I don't think the cat /proc/mdstat or the mdadm --detail /dev/md127 have changed much (if any) using a few reassemble commands after the initial post, but here goes a new run of each command this morning:

Code

root@openmediavault:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid5 sdb[1] sde[3] sdd[2]
      17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 1/44 pages [4KB], 65536KB chunk

unused devices: <none>

Code

root@openmediavault:~# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Wed Mar 16 19:00:56 2022
        Raid Level : raid5
        Array Size : 17581171200 (16766.71 GiB 18003.12 GB)
     Used Dev Size : 5860390400 (5588.90 GiB 6001.04 GB)
      Raid Devices : 4
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jan 30 17:10:11 2024
             State : clean, degraded
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : Bern:Storage
              UUID : 0d9284e2:af7d455e:721ec148:2f768fe4
            Events : 51116

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       16        1      active sync   /dev/sdb
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde

Alles anzeigen

LukeR1886 · 31. Januar 2024

UPDATE 1/30
The /dev/sdc (original) and the /dev/sdf (spare I have from different machine) both finished "Secure Wipe" routines. When I try to add /dev/sdc to the array as a replacement using the WebUI, it returns this error message:

Anyone have any further ideas? I found a thread from another forum where someone battled a very similar issue, but using only mdadm, as whatever system he was using was NOT OMV. I'll add a screenshot of his solution and the URL to the forum below the code box. Is his solution safe for me at all?

Code

OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; mdadm --manage '/dev/md127' --add /dev/sdc 2>&1' with exit code '1': mdadm: add new device failed for /dev/sdc as 4: Invalid argument in /usr/share/php/openmediavault/system/process.inc:197
Stack trace:
#0 /usr/share/openmediavault/engined/rpc/raidmgmt.inc(419): OMV\System\Process->execute()
#1 [internal function]: Engined\Rpc\RaidMgmt->add(Array, Array)
#2 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#3 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('add', Array, Array)
#4 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('RaidMgmt', 'add', Array, Array, 1)
#5 {main}

The solution to another RAID5 disaster (like this one), and the URL

Basically it looks like he carefully and manually rebuilt the array and superblock data with a few (precise) commands.

https://www.linuxquestions.org/questions/linux-server-73/mdadm-error-replacing-a-failed-disk-909577/

LukeR1886 · 30. Januar 2024

Krisbee and geaves thank you both, kindly, for the input you've provided thus far.

I'm preparing to rebuild the array, but have a few questions as I am not new to tech, but this is my first time repairing a completely missing drive.

Can I mount the filesystem before trying a RAID rebuild, to recover some of the stuff that I know isn't backed up anywhere else? This is just a precaution in case the array completely dies on recovery. It's probably or 10% or 15% of the entire array's volume that I'd be focused on retrieving. When I tried to mount it in the WebUI, I got an error that I'll post at the bottom...

Here is my preparation to start the recovery, anything else I should know/ do ?

I have the original /dev/sdc (a Seagate ST6000NM0014) reaching the end of its secure wipe routine.
I installed a "spare" drive as /dev/sdf (a Seagate ST6000NM0034) currently being secure-wiped too, just in case the original drive won't work.
Both are pulled from the actual chassis and are arranged in a way to to have extra fans placed on them, dropped the temperature from 38c to 30c, as I'm sure the rebuilding of the array will warm things up a little and... well, the drives have been healthy for a long time, but they're high-mileage units.

"clean, degraded" array mount error: Looks like the superblock is unreadable. Is that normal or repairable?

Code

Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': debian:
----------
          ID: create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: file.accumulated
      Result: True
     Comment: Accumulator create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9 for file /etc/fstab was charged by text
     Started: 04:54:06.075138
    Duration: 0.624 ms
     Changes:
----------
          ID: mount_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: mount.mounted
        Name: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941
      Result: False
     Comment: mount: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941: can't read superblock on /dev/md127.
     Started: 04:54:06.076207
    Duration: 132.162 ms
     Changes:
----------
          ID: append_fstab_entries
    Function: file.blockreplace
        Name: /etc/fstab
      Result: True
     Comment: No changes needed to be made
     Started: 04:54:06.208937
    Duration: 3.191 ms
     Changes:

Summary for debian
------------
Succeeded: 2
Failed:    1
------------
Total states run:     3
Total run time: 135.977 ms

OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export LANGUAGE=; omv-salt deploy run --no-color fstab 2>&1' with exit code '1': debian:
----------
          ID: create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: file.accumulated
      Result: True
     Comment: Accumulator create_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9 for file /etc/fstab was charged by text
     Started: 04:54:06.075138
    Duration: 0.624 ms
     Changes:
----------
          ID: mount_filesystem_mountpoint_8ae29dcc-3d6a-4a21-a45a-a0d65fc7e1a9
    Function: mount.mounted
        Name: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941
      Result: False
     Comment: mount: /srv/dev-disk-by-uuid-fc4c074a-e2d7-46ee-82bd-2983d057c941: can't read superblock on /dev/md127.
     Started: 04:54:06.076207
    Duration: 132.162 ms
     Changes:
----------
          ID: append_fstab_entries
    Function: file.blockreplace
        Name: /etc/fstab
      Result: True
     Comment: No changes needed to be made
     Started: 04:54:06.208937
    Duration: 3.191 ms
     Changes:

Summary for debian
------------
Succeeded: 2
Failed:    1
------------
Total states run:     3
Total run time: 135.977 ms in /usr/share/php/openmediavault/system/process.inc:197
Stack trace:
#0 /usr/share/php/openmediavault/engine/module/serviceabstract.inc(62): OMV\System\Process->execute()
#1 /usr/share/openmediavault/engined/rpc/config.inc(170): OMV\Engine\Module\ServiceAbstract->deploy()
#2 [internal function]: Engined\Rpc\Config->applyChanges(Array, Array)
#3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#4 /usr/share/php/openmediavault/rpc/serviceabstract.inc(149): OMV\Rpc\ServiceAbstract->callMethod('applyChanges', Array, Array)
#5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(588): OMV\Rpc\ServiceAbstract->OMV\Rpc\{closure}('/tmp/bgstatusqV...', '/tmp/bgoutputlO...')
#6 /usr/share/php/openmediavault/rpc/serviceabstract.inc(159): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure))
#7 /usr/share/openmediavault/engined/rpc/config.inc(192): OMV\Rpc\ServiceAbstract->callMethodBg('applyChanges', Array, Array)
#8 [internal function]: Engined\Rpc\Config->applyChangesBg(Array, Array)
#9 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#10 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('applyChangesBg', Array, Array)
#11 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('Config', 'applyChangesBg', Array, Array, 1)
#12 {main}

Alles anzeigen

LukeR1886 · 30. Januar 2024

UPDATE:

I haven't heard anything back yet, but I've made some progress and figured out some of this headache on my own. I could still REALLY use some help.

I learned that when the array says active (auto-read-only) that the host OS will eventually work itself out of read-only mode without any further inputs.
So now:

I've allowed the array to finish syncing and the active (auto-read-only) property has disappeared.
I've run the "wipe drive" tool in secure mode on the /dev/sdc (the drive that was out of sync) until it reached 25%
The commands mdadm /dev/md127 --fail /dev/sdc and mdadm /dev/md127 --remove /dev/sdc were already reviously run, but I ran them again to ensure the drive was removed.
When I run the command mdadm --add /dev/md127 /dev/sdc I cannot add the drive with the following error:

Code

root@openmediavault:~# mdadm --add /dev/md127 /dev/sdc
mdadm: add new device failed for /dev/sdc as 4: Invalid argument

The mdadm --detail /dev/md127 command is as follows, and displays that the Superblock data knows there are 4 devices, but only 3 are active. I cannot figure out the command to remove the "removed" device from the Superblock data. So thus I'm stuck unable to add a replacement device with the error message mdadm: add new device failed for /dev/sdc as 4: Invalid argument

Code

root@openmediavault:~# mdadm --detail /dev/md127
/dev/md127:
           Version : 1.2
     Creation Time : Wed Mar 16 19:00:56 2022
        Raid Level : raid5
        Array Size : 17581171200 (16766.71 GiB 18003.12 GB)
     Used Dev Size : 5860390400 (5588.90 GiB 6001.04 GB)
      Raid Devices : 4
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Jan 29 21:00:27 2024
             State : clean, degraded
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : Bern:Storage
              UUID : 0d9284e2:af7d455e:721ec148:2f768fe4
            Events : 51099

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       16        1      active sync   /dev/sdb
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde

Alles anzeigen

If anyone can chime in on the process of removing the "removed" drive and adding a new one, I would greatly appreciate any help! Thanks in advance to anyone who can assist!

Krisbee geaves

LukeR1886 · 29. Januar 2024

Zitat von geaves

Post the output of cat /proc/mdstat post in a code box please, this symbol </> on the forum bar makes it easier to read

The output from #4 of the above shows the array as (auto-read-only), also to re add /dev/sdc with the 'Possibly out of date' error the drive will have to be securely wiped, this can usually be run to 25% then stopped, then try re adding the drive to the array. Do not add the drive until it has finished rebuilding and the (auto-read-only) is corrected.

geaves So, I've already attempted to add the drive back, got errors, and performed a --fail /dev/sdc and --remove /dev/sdc to remove it. now it's back to the previous, non functional state. I hope nothing was damaged.

How do I get the machine to rebuild the array with only 3 drives... to get the array out of auto-read-only?

Here's the cat /proc/mdstat

Code

root@openmediavault:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active (auto-read-only) raid5 sdb[1] sde[3] sdd[2]
      17581171200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 0/44 pages [0KB], 65536KB chunk

unused devices: <none>

LukeR1886 · 29. Januar 2024

And now when I --fail /dev/sdc and --remove /dev/sdc it all goes back where we were before...

LukeR1886 · 29. Januar 2024

Oh, shoot....

Here's the errors on the terminal screen of the VM container when adding the /dev/sdc.

And also the SMART status in the WebUI of /dev/sdc directly after (was operating correctly before adding)

LukeR1886 · 29. Januar 2024

Zitat von Krisbee

As /dev/md127 is currently active with 3 out 4 drives, try failing /dev/sdc, then remove it from the array and add it back and see if it re-syncs OK.

mdadm /dev/md127 --fail /dev/sdc
mdadm /dev/md127 --remove /dev/sdc
mdadm /dev/md127 --add /dev/sdc

If it still complains it's /dev/sdc out of date, then perhaps you're going to have to wipe that drive before it can be added back to the array.

Alles anzeigen

OK, and is this to be run with the array stopped?
e.g. run this first? mdadm --stop /dev/md127

Beiträge von LukeR1886

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!

UPDATE: RESOLVED. Another case of missing RAID5 and file system. Please help!