Raid1 Failure

  • I've had a raid1 array failure, it looks like it happened overnight during a resync. Normally I would try to sort it out on my own, but with SMART being disabled on both drives for some reason, I'm less confident. Please let me know what info you need, or steps I should take to get my array functioning again.


    As per ryecoaaron and Degraded or missing raid array questions Here is my info:





    Code
    root@RoachHotel:~# blkid
    /dev/sda1: UUID="5a75f3af-251a-43c2-b1c6-d8bfb62b53ce" TYPE="ext4"
    /dev/sde: UUID="e371445a-e72e-2298-0b4d-f961a7f3b799" UUID_SUB="0c838662-4d40-287a-b137-28a93821195e" LABEL="RoachHotel:Volume2" TYPE="linux_raid_member"
    /dev/sdd: UUID="e371445a-e72e-2298-0b4d-f961a7f3b799" UUID_SUB="5b137bcd-d7fd-0a2f-1aab-3dd3c712d87b" LABEL="RoachHotel:Volume2" TYPE="linux_raid_member"
    /dev/sdb: UUID="81aa4e20-7495-6f6b-39df-90c3236e8a53" UUID_SUB="e3fd4806-9ec2-f1cf-5c67-0539feec47f5" LABEL="RoachHotel:Volume3" TYPE="linux_raid_member"
    /dev/md1: LABEL="Volume2" UUID="c806cd00-887e-4f26-a629-250a06182018" TYPE="ext4"
    /dev/md2: LABEL="Volume3" UUID="cc3b5002-20d9-44bf-a583-e76fcbeb8bdd" TYPE="ext4"
    /dev/sdc: UUID="81aa4e20-7495-6f6b-39df-90c3236e8a53" UUID_SUB="de60ab9e-0ce8-e732-9008-eada17f76806" LABEL="RoachHotel:Volume3" TYPE="linux_raid_member"
    Code
    root@RoachHotel:~# mdadm --detail --scan --verbose
    ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2
       devices=/dev/sdf,/dev/sdg
    ARRAY /dev/md1 level=raid1 num-devices=2 metadata=1.2 name=RoachHotel:Volume2 UUID=e371445a:e72e2298:0b4df961:a7f3b799
       devices=/dev/sdd,/dev/sde
    ARRAY /dev/md2 level=raid1 num-devices=2 metadata=1.2 name=RoachHotel:Volume3 UUID=81aa4e20:74956f6b:39df90c3:236e8a53
       devices=/dev/sdc,/dev/sdb
  • It seems like just a single disk failure, so I've ordered a new disk and will work on a rebuild. I'm sure this is obvious to most people, but I was just hoping for a confirmation from more knowledgeable folks here before I started spending money on the process.

  • Okay, I added a disk back in and successfully rebuilt the array. Just making a note of this for anyone looking for the same info in the future.


    So it is most important to note that OMV's default behavior seems to be removing both drives from a failed array, including SMART diagnostics.

  • It seems like just a single disk failure


    Maybe, maybe not. You did not provide dmesg or /var/log/syslog output so at least I find it pretty hard to guess what might have happened? In fdisk output above /dev/sdg is totally missing, this can be the result of a completely dead drive or of $whatever. I've seen drives kicked out of arrays for various reasons already (cable/contact issues, underpowering, vibrations, sometimes even a disk problem or even a 'dead disk')


    @ryecoaaron and others: Wouldn't it be a nice feature to allow OMV sending such 'debug logs' as Armbian already does (example). At least dmesg output, SMART health data (and attribute 199!) and HBA info should be collected and then submitted to an online pasteboard service as above.

    • Offizieller Beitrag

    Wouldn't it be a nice feature to allow OMV sending such 'debug logs' as Armbian already does

    omv-extras used to have a way to submit a report that was stored on the omv-extras server but not many people used it. People can also download a system information report from the Diagnostics -> System Information -> Report tab. But, I admit that armbianmonitor works pretty well. What would you suggest - fork armbianmonitor, just use it as is more systems, other ideas?

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • tkaiser, thanks for the response.


    I would be happy to give you more information if it might help me determine what has gone wrong. I posted all the info that ryecoaaron had in his sticky.


    Just let me know what else you need, I'm not well versed in a debian environment.

  • I admit that armbianmonitor works pretty well. What would you suggest - fork armbianmonitor, just use it as is more systems, other ideas?

    Well, I think having a tool like omv-diag ready that can be used directly via SSH (and maybe later in some way through the web UI) to

    • upload a brief system overview (containing system overview and at least last 200 dmesg lines)
    • collect a more complete report somewhere to be manually pasted to an online pasteboard service

    would already be nice to support such issues like this here. Since why does a disk disappear from the bus? If the reason is a faulty cable or contact issues then sure a replacement disk will help since it fixes the real issue ('connection loss') by accident too.


    And armbianmonitor is not the right tool to fork (since in Armbian the main logging happens at every startup and goes to /var/log/armhwinfo.log where it will be collected later when armbianmonitor uploads stuff) so maybe starting with another fork that already took care of this? Eg. https://github.com/ayufan-rock…bin/rock64_diagnostics.sh



    Just let me know what else you need


    In case you haven't rebooted yet, the output from the following command would be useful:

    Code
    dmesg | curl -F 'sprunge=<-' http://sprunge.us

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!