Newb: Power cut borked my RAID

  • Hi,

    Always grateful for your help here... I was doing so well... :)

    We had a power cut the other night and my UPS didn't last long enough to catch things...

    Now my RAID seems to be in trouble. By email I'm getting: (Zulu is the name of my RAID...)

    Code
    The system monitoring needs your attention.
    
    Host:        \openmediavault.local
    Date:        Wed, 07 Apr 2021 08:52:04
    Service:     filesystem_srv_dev-disk-by-label-zulu
    Event:       Does not exist
    Description: unable to read filesystem '/srv/dev-disk-by-label-zulu' state

    and


    Code
    The system monitoring needs your attention.
    
    Host:        \openmediavault.local
    Date:        Wed, 07 Apr 2021 08:52:35
    Service:     mountpoint_srv_dev-disk-by-label-zulu
    Event:       Status failed
    Description: status failed (1) -- /srv/dev-disk-by-label-zulu is not a mountpoint
    
    This triggered the monitoring system to: alert

    Other info:


    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : inactive sda[4](S) sdc[2](S) sdb[1](S)
          5860147464 blocks super 1.2
           
    unused devices: <none>
    Code
    root@openmediavault:~# blkid
    /dev/sdb: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="5667c0d5-4cec-a644-36a3-e641ec176a46" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
    /dev/sda: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="acaec5ef-d304-a129-3754-9605267dcbdf" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
    /dev/sde1: UUID="173d1141-65e9-4ee1-ae31-b73d34f7b2cf" TYPE="ext4" PARTUUID="9d8e1096-01"
    /dev/sde5: UUID="6947d5ca-f259-4fe7-be54-5b945620213c" TYPE="swap" PARTUUID="9d8e1096-05"
    /dev/sdc: UUID="e722afd9-5803-5460-ce00-e63017883000" UUID_SUB="d49ed9a5-6400-f405-ea4d-0601f2e60642" LABEL="openmediavault.local:zulu" TYPE="linux_raid_member"
    /dev/sdf1: LABEL="Backup" UUID="83ed8d9d-e2f7-4e64-bfc8-8fe26f404112" TYPE="ext4" PARTUUID="4f80a638-fd4a-44e9-851a-3c9575507f12"
    Code
    root@openmediavault:~# mdadm --detail --scan --verbose
    INACTIVE-ARRAY /dev/md0 num-devices=3 metadata=1.2 name=openmediavault.local:zulu UUID=e722afd9:58035460:ce00e630:17883000
       devices=/dev/sda,/dev/sdb,/dev/sdc
    Code
    root@openmediavault:~# mdadm --stop /dev/md0
    mdadm: stopped /dev/md0
    root@openmediavault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcd]
    mdadm: looking for devices for /dev/md0
    mdadm: Cannot read superblock on /dev/sdd
    mdadm: no RAID superblock on /dev/sdd
    mdadm: /dev/sdd has no superblock - assembly aborted

    The section above was how I rebuilt things last time but it's not responding in this case. One disk seems to be struggling as it has a red marker next to it in the SMART section of OMV but surely the RAID should still function, albeit in a degraded state? Do I have several issues at once, not just the failing HD?


    How can I get things back on a solid footing? :)


    S

  • Just trying to make sense of all the Terminal stuff...

    The disk which is failing the SMART test is /dev/sde. The disk that mdadm complains about is /dev/sdd...


    Code
     mdadm: /dev/sdd has no superblock - assembly aborted
  • Look at cat /proc/mdstat mdadm --detail --scan and blkid


    mdadm --detail /dev/md0 will also confirm the above :)

    I'm way out of my depth here... I get:

    Should my command be...?

    Code
    root@openmediavault:~# mdadm --assemble --force --verbose /dev/md0 /dev/sd[abd]

    I don't understand where the 4th drive is... Ok... so the 4th drive has failed and only sda/d/b remain...?


    S

    • Offizieller Beitrag

    I don't understand where the 4th drive is... Ok... so the 4th drive has failed and only sda/d/b remain

    ?( this is beginning to make very little sense, going back to your first post;


    cat /proc/mdstat = /dev/[abc]

    blkid = /dev/[abc]

    mdadm --detail --scan = /dev/[abc]


    no where in the above does /dev/sdd show other than fdisk -l, that tells you the system 'sees' the drive but it doesn't know where or what it's doing, hence the no superblock error when trying to run mdadm --assemble.


    #6 throws a totally new curve ball, it lists the raid as raid0, added to that it's listing the drives as /dev/sd[abd]


    Have you rebooted at all after your first post?

  • Yes, I rebooted twice in the hope things might untangle themselves...

  • Thought I'd run the commands again as a tripple check:

    • Offizieller Beitrag

    Yes, I rebooted twice in the hope things might untangle themselves

    If I could have a £1 every time someone said that ^^^^ that explains why the drive references have changed.


    So confirm the array has stopped, and run the mdadm --assemble but with /dev/sd[abd] hopefully that will work and rebuild the array in a degraded state.


    DO NOT REBOOT, DO NOT PASS GO :D


    Then run mdadm --examine /dev/sdc only after the raid had been rebuilt

  • <If I could have a £1 every time someone said that ^^^^ that explains why the drive references have changed.>


    Sorry! :) Desperation! :)


    <So confirm the array has stopped>


    With

    Code
    root@openmediavault:~# mdadm --stop /dev/md0

    ?

    <run the mdadm --assemble but with /dev/sd[abd] hopefully that will work and rebuild the array in a degraded state.>


    It's ok to do all this from outside the OMV GUI? I read somewhere to do as much as possible from the GUI so as not to add extra entries to the DB...?


    Will the Terminal give progress/tell me when the array is rebuilt?


    Is there any mileage in replacing the dodgy disk and *then* rebuilding things? I have a new one...


    Thanks for your time as always! :)


    s

    • Offizieller Beitrag

    It's ok to do all this from outside the OMV GUI? I read somewhere to do as much as possible from the GUI so as not to add extra entries to the DB

    You have to run --assemble from the cli, you've already blown your brownie points by rebooting :)

    Will the Terminal give progress/tell me when the array is rebuilt

    It can do if you run cat /proc/mdstat once the rebuild has started, but it should display in the GUI anyway

    Is there any mileage in replacing the dodgy disk and *then* rebuilding things? I have a new one

    Where's the 'dodgy' disk, you don't know yet if the array will rebuild, let's do one step at a time, but to answer your question you can't slap in a new disk because the array is currently inactive.

  • You have to run --assemble from the cli, you've already blown your brownie points by rebooting :)

    It can do if you run cat /proc/mdstat once the rebuild has started, but it should display in the GUI anyway

    Where's the 'dodgy' disk, you don't know yet if the array will rebuild, let's do one step at a time, but to answer your question you can't slap in a new disk because the array is currently inactive.

    Thank you Sir!


    I'm going to have a go at this over the weekend. I'm having to drag myself away for work overnight.


    Really appreciate the assistance! Keep fingers crossed for me! :)


    S

  • You have to run --assemble from the cli, you've already blown your brownie points by rebooting :)

    It can do if you run cat /proc/mdstat once the rebuild has started, but it should display in the GUI anyway

    Where's the 'dodgy' disk, you don't know yet if the array will rebuild, let's do one step at a time, but to answer your question you can't slap in a new disk because the array is currently inactive.

    My hands are a bit shaky and clammy but things seem to be rebuilding happily now.


    Thanks again :)


    S

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!