RAID 5 not available 2 of 3 HDDs missing

  • Hi,


    maybe you could give me some advice to bring my storage back to live?


    I use a OMV 2.2.14, wihch was running since 2016 without problems.


    The RAID5 presists of 3x3TB WD RED


    Drive configuration:
    /dev/sda: system and swap
    /dev/sdb - /dev/sdd RAID5 array


    I didn't do anythig special when the array suddenly stopped working.


    As I don't need the storage permanently, I start the OMV via WOL on demand and use Autoshutdown.


    Checking the OMV GUI, SMART reports for /dev/sdc:
    'Device has a vew bad sectors'
    All other devices are 'Good'


    Here some information that may help to understand the problem:


    root@OpenMediaVault:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md0 : inactive sdd[2]
      2930135512 blocks super 1.2



    unused devices: <none>
    root@OpenMediaVault:~# blkid
    /dev/sda1: UUID="439440a1-b67d-44a9-9569-5d05be5eb1d4" TYPE="ext4"
    /dev/sda5: UUID="59834cdd-708d-4ba9-9a59-1f1d6a5ee2cf" TYPE="swap"
    /dev/sdd: UUID="7c016f86-f8df-c2fe-7be1-d809b79b6f1e" UUID_SUB="8d28819a-df0b-1da1-f7c1-0143ff15a021" LABEL="OpenMediaVault:RAID5" TYPE="linux_raid_member"
    /dev/sdb: UUID="7c016f86-f8df-c2fe-7be1-d809b79b6f1e" UUID_SUB="d92491d5-94b1-0b92-a7a3-c28e53dcec4b" LABEL="OpenMediaVault:RAID5" TYPE="linux_raid_member"
    /dev/sdc: UUID="7c016f86-f8df-c2fe-7be1-d809b79b6f1e" UUID_SUB="750ce038-2cce-d625-4ae3-4bffa812f4d4" LABEL="OpenMediaVault:RAID5" TYPE="linux_raid_member"
    root@OpenMediaVault:~# fdisk -l | grep "Disk "
    Disk /dev/sdc doesn't contain a valid partition table
    Disk /dev/sdb doesn't contain a valid partition table
    Disk /dev/sdd doesn't contain a valid partition table
    Disk /dev/sda: 250.1 GB, 250059350016 bytes
    Disk identifier: 0x000307a0
    Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes
    Disk identifier: 0x00000000
    Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes
    Disk identifier: 0x00000000
    Disk /dev/sdd: 3000.6 GB, 3000592982016 bytes
    Disk identifier: 0x00000000
    root@OpenMediaVault:~# cat /etc/mdadm/mdadm.conf
    # mdadm.conf
    #
    # Please refer to mdadm.conf(5) for information about this file.
    #



    # by default, scan all partitions (/proc/partitions) for MD superblocks.
    # alternatively, specify devices to scan, using wildcards if desired.
    # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
    # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
    # used if no RAID devices are configured.
    DEVICE partitions



    # auto-create devices with Debian standard permissions
    CREATE owner=root group=disk mode=0660 auto=yes



    # automatically tag new arrays as belonging to the local system
    HOMEHOST <system>



    # definitions of existing MD arrays
    ARRAY /dev/md0 metadata=1.2 name=OpenMediaVault:RAID5 UUID=7c016f86:f8dfc2fe:7be1d809:b79b6f1e


    My plan to solve the problem is to replace /dev/sdc (the drive with bad sectors) with a new (WD RED 4TB) disk and start a reassemby by:
    mdadm --stop /dev/md0


    mdadm --assemble --force --verbose /dev/md0 /dev/sd[dcb]



    Do you agree that this is the best plan?



    Thanks in advance,


    Markus

    • Offizieller Beitrag

    Do you agree that this is the best plan?

    No, you can't just replace a drive without first removing the defective one from the raid, not physically but from mdadm.


    But the raid could be completely dead, all data lost, cat /proc/mdstat shows the raid as inactive but displays only one drive that in itself would suggest the raid has gone, blkid shows no raid and fdisk can't find a partition table on any of the drives within the raid.


    What you could try without removing any drives is to run your two suggested commands and see what happens, any errors and the raids dead and your data gone.

  • OK, this is something that I don't want to hear.


    I assume, as long as I don't try to reassemble the raid, the data is still there, so maybe I could try something different before I play my last card?


    What I didn't understand up to now is, what happens if one of the dirves gets damaged? Does the raid stop working and I get a message?

    • Offizieller Beitrag

    OK, this is something that I don't want to hear.

    Sorry, a Raid5 can lose one drive and not lose data, as I said your cat /proc/mdstat is seeing only one :( out of the three.


    As I said you could try mdadm --stop the mdadm --assemble as you suggested, but on the three drives currently installed. If there is an error then post it here, but prepare for the worst, when it comes to Raid I'm a pessimist :)

  • Hi geaves,


    after fighting some time with myself, I decided to give it a try:


    root@OpenMediaVault:~# mdadm --stop /dev/md0
    mdadm: stopped /dev/md0


    root@OpenMediaVault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[dcb]
    mdadm: looking for devices for /dev/md0
    mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
    mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
    mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
    mdadm: forcing event count in /dev/sdb(0) from 406 upto 408
    mdadm: forcing event count in /dev/sdc(1) from 406 upto 408
    mdadm: added /dev/sdc to /dev/md0 as 1
    mdadm: added /dev/sdd to /dev/md0 as 2
    mdadm: added /dev/sdb to /dev/md0 as 0
    mdadm: /dev/md0 has been started with 3 drives.


    root@OpenMediaVault:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md0 : active (auto-read-only) raid5 sdb[0] sdd[2] sdc[1]
    5860270080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
    resync=PENDING
    unused devices: <none>


    Does 'resync=PENDING' mean that the resync is now running in the background, or do I have to do something else to start the resync?


    Regards,
    Markus

    • Offizieller Beitrag

    Does 'resync=PENDING' mean that the resync is now running in the background, or do I have to do something else to start the resync?

    :D I could say it's going into nuclear melt down but that's me being sarcastic :D


    Do me one favour when you execute a command and you want to post the output use the spoiler option on the menu bar (second from right) paste each command output in a new spoiler window, it makes it easier to read :)


    But to answer to your question, no, it's doing nothing because it's an auto-read-only state, try the following from the cli


    mdadm --readwrite /dev/md0


    if that errors you may have to stop the array first, then run that.

  • Hi geaves,


    the nuclear melt down didn't happen, instead, after about 10 houres of resyncing, all data seem to be back again!


    Thank you very much for your help :thumbup:


    Now, to avoid that stress for the future, I think I should improve my backup strategy.


    Kind Regards,
    Markus

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!