Raid 5 after extension: clean, degraded

  • Hi all,


    I'm on 6.0.28-3 (Shaitan). I extended my RAID-5 to a new 4 TB HDD, having 3x 4 TB already in the system. During extension, the system shut down (for some unknown reason, maybe overheating), but when I started it again, it continued with the extension. When it ended (seemingly successfully), I was in a hurry and just quickly extended the file system, which worked.


    Now having a closer look at the RAID, it tells me it's in the state "clean, degraded":

    Code
    Version : 1.2 Creation Time : Sun Sep 2 04:05:15 2018 Raid Level : raid5 Array Size : 11720661504 (11177.69 GiB 12001.96 GB) Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) Raid Devices : 4 Total Devices : 3 Persistence : Superblock is persistent
    Intent Bitmap : Internal
    Update Time : Fri Jan 27 07:00:56 2023 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0
    Layout : left-symmetric Chunk Size : 512K
    
    Consistency Policy : bitmap
    Name : openmediavault:RAIDAR (local to host openmediavault) UUID : e98b7abd:4f328c81:40a102c3:1824afcf Events : 79896
    Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 2 8 48 2 active sync /dev/sdd - 0 0 3 removed

    Googling suggested to me the mdadm --add command to add the missing drive back to the array. However, I would have expected the "recover" option in the GUI to do the same, but I cannot select a device there:



    Does anyone have experience with this? Can I safely execute the mdadm --add command or do I need to do something else?


    Here some detailed information:

    cat /proc/mdstat

    Code
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active raid5 sdc[1] sdb[0] sdd[2]
    11720661504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
    bitmap: 20/30 pages [80KB], 65536KB chunk

    blkid

    Code
    /dev/sda1: UUID="64ae1488-3bd9-4236-8742-9ea44db6f56c" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="76aa5ac0-01"
    /dev/sda5: UUID="c2b0cb47-aeec-4b5a-8285-857b1c56da54" TYPE="swap" PARTUUID="76aa5ac0-05"
    /dev/sdb: UUID="e98b7abd-4f32-8c81-40a1-02c31824afcf" UUID_SUB="a36eadb0-2348-fb83-ec76-65c9fa5df48b" LABEL="openmediavault:RAIDAR" TYPE="linux_raid_member"
    /dev/sdc: UUID="e98b7abd-4f32-8c81-40a1-02c31824afcf" UUID_SUB="94cf7512-43e5-3957-7060-0e6cc0cdd526" LABEL="openmediavault:RAIDAR" TYPE="linux_raid_member"
    /dev/sdd: UUID="e98b7abd-4f32-8c81-40a1-02c31824afcf" UUID_SUB="f1ae8b96-55da-2541-bc00-7be870687109" LABEL="openmediavault:RAIDAR" TYPE="linux_raid_member"
    /dev/md127: LABEL="Raidar" UUID="5d21dac9-d7ba-4831-9d29-e6d9d8de5b3b" BLOCK_SIZE="4096" TYPE="ext4"
    /dev/sde: UUID="e98b7abd-4f32-8c81-40a1-02c31824afcf" UUID_SUB="e80184c3-5dc3-17b4-1f73-a6f95f5fb718" LABEL="openmediavault:RAIDAR" TYPE="linux_raid_member"
    /dev/sdf1: UUID="b533ba9f-52ff-9d49-8092-a954a53881e4" BLOCK_SIZE="4096" TYPE="ext4" PTUUID="d433308c" PTTYPE="dos" PARTUUID="d433308c-01"

    fdisk -l | grep "Disk "

    cat /etc/mdadm/mdadm.conf

    mdadm --detail --scan --verbose

    Code
    ARRAY /dev/md/openmediavault:RAIDAR level=raid5 num-devices=4 metadata=1.2 name=openmediavault:RAIDAR UUID=e98b7abd:4f328c81:40a102c3:1824afcf
    devices=/dev/sdb,/dev/sdc,/dev/sdd


    Any help is appreciated, thank you.

    • Offizieller Beitrag

    However, I would have expected the "recover" option in the GUI to do the same, but I cannot select a device there:

    That is because the drive /dev/sde has a raid signature on it according to blkid


    I assume that /dev/sde is the drive you added to grow the array, if that's the case then mdadm --add /dev/md127 /dev/sde should add the drive back to the array

  • That is because the drive /dev/sde has a raid signature on it according to blkid


    I assume that /dev/sde is the drive you added to grow the array, if that's the case then mdadm --add /dev/md127 /dev/sde should add the drive back to the array

    Thank you for the quick response. OK, I did that. Unfortunately it's still "clean, degraded":


    cat /proc/mdstat

    Code
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active raid5 sde[3](F) sdc[1] sdb[0] sdd[2]
    11720661504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
    bitmap: 11/30 pages [44KB], 65536KB chunk

    Looks like the drive is faulty :(

    Do you know if this must be a hardware error or do I have any (software-wise) recovery options from here on?

    • Offizieller Beitrag

    Looks like the drive is faulty

    Is the drive new or repurposed? Have you run a long SMART test on it? Might not be the drive could be, m'board connection, sata cable, intermittent power

    Do you know if this must be a hardware error or do I have any (software-wise)

    This is hardware related

    recovery options from here on

    A backup :) the current array is still accessible, but I always advise restricting access as much as possible

  • The drive is brand-new. I can only image that the shutdown during extension broke something :(

    I would like to run a long SMART test, but again no device is listed:



    Am I doing something wrong?


    EDIT: Solved it by

    sudo smartctl -t long /dev/sde


    It will be running for 10 hours...

    • Offizieller Beitrag

    The drive is brand-new

    Then you would not expect mdadm to mark it as failed, but then again nothing is guaranteed. If the SMART output shows nothing relevant then I would suggest a wipe of the drive Storage -> Disks -> Wipe and run secure


    If the drive fails again to be added to the array then you're looking at hardware, or if you're able to connect it to a Windows machine and run the manufacturers diagnostic tools


    To get an RMA on the drive you need to ensure that it is the drive and not something else

  • So the test is still running, but curiously I looked at sudo smartctl -a /dev/sde and found some errors in the log:



    It doesn't seem like these error codes (800-804) are the official ones from WD, but maybe someone knows how to interpret them...

  • Skullchuck

    Hat das Label gelöst hinzugefügt.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!