RAID 6 Array (6x8TB) device Disappeared

  • OMV 2.1


    had 6 x 4TB drives.


    Adaptec controller configured disks as JBOD


    Raid running fine.


    Then one by one, replaced disks with 8TB - never rebooting NAS in between.


    Grew raid, lvm, etc until had to grow filesystem. Found out resize2fs with OMV2.1 (Wheezy) has older resize2fs that doesn't support raid larger than 16TB.


    I downloaded Live debian 9 on USB with newer core to use resize2fs on my volume. I restart NAS, miss the boot option it goes back into OMV. However, raid is missing now. /dev/sd[b-g] no longer in dev list.


    I reboot again and hit control-A and no JBOD showing up any longer. it looks like adaptec actually does something to disks to make them JBOD.


    So I buy a pure 8 port SATA card. replace the adaptec with this SUpermicro card. It only does JBOD. Now when I boot up OMV, I see the drives /dev/sd[b-g], however syslog still says:


    Sep 19 20:08:34 CHOMEOMV anacron[2496]: Anacron 2.3 started on 2017-09-19
    Sep 19 20:08:35 CHOMEOMV mdadm[2533]: DeviceDisappeared event detected on md device /dev/md127



    What do I do now? Can I force assemble the raid array?


    Any help would be appreciated.

    • Offizieller Beitrag

    Can I force assemble the raid array?

    Yes. mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcdefg]

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Here is what I got:


    Code
    root@CHOMEOMV:/var/log# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcdefg]
    mdadm: looking for devices for /dev/md127
    mdadm: Cannot assemble mbr metadata on /dev/sdb
    mdadm: /dev/sdb has no superblock - assembly aborted
    • Offizieller Beitrag

    And this is what starts the process that doesn't seem go right very often. Normally, I say zero the superblock on the offending drive and try to re-assemble but I am really tired of it not working sometimes.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • I am not afraid of taking my time. I can have patience for a long process if the outcome has a higher probability of success. But from the vibe I get from you it would seem that it is my only option and that there is not a good success rate. And likely that action is not recoverable either. Fair assessment?

  • What about assembling the array without sdb?
    This should lead to a degraded array. After that, you can do what you want with sdb (check, zero superblock, delete complete...)
    At last, add sdb to the array.


    ryecoaaron?

    --
    Get a Rose Tattoo...


    HP t5740 with Expansion and USB3, Inateck Case w/ 3TB WD-Green
    OMV 5.5.23-1 Usul i386|4.19.0-9-686-pae

  • I'm a bit surprised that no one asked for logs or at least a quick check for cabling/connectiviy problems. The first thing I do when dealing with consumer grade NAS setups is always something like this


    Code
    du -sk /var/log/syslog* | sort -n -r
    for disk in /dev/sd? ; do smartctl -a $disk | grep '^199'; done

    If there's a huge log this is most probably the one flooded with 'ATA bus error', 'hard resetting link' and the other fun stuff talking about cable/connector issues and if there are disks with high 199 SMART values then these are the affected ones.


    This stuff (physical interconnections of components) should be considered the lowest layer that has to work for everything that's layered above to work properly or at all. Compare it to a house: building it on quicksand will never work regardless how much steel and concrete you use.


    Wrt RAID problems I can't help (RAID-Z user not able to use anachronstic RAID any more)

  • Re,


    may be you ran into some very special problems regarding DCB's. Once you used the Adaptec-Card, it has surely written it's DCB (even it was JBOD) on all disks ... normally first 4k on each disk. Then you made the Superblock (so called DCB by mdadm) on top - and after that magic failing upgrade you change the controllercard ... i assume mdadm cannot find his superblock anymore, and it seems that Linux can not find any partition (related) info too.


    So i'll go with @tkaiser: in your case it seems to be a cabling issue ... including all connectors (even power) and cables (also even power).


    Can you go deeper in detail with the types used (controllers and drives)?


    Sc0rp

  • So i'll go with tkaiser: in your case it seems to be a cabling issue ...

    I only recommended to always check this first since symptoms of 'failing disk' and 'faulty cabling / connector issues' are pretty similar. And since we're not dealing with a proprietary RAID box but a normal Linux host checking logs is always a good idea (what happened with sdb for example).


    Since you mentioned powering: yeah, that's always worth a check too. I remember 'funny' stories from a colleague upgrading a 24 bay RAID box few years ago. The backplane had a somewhat stupid power design leading to some voltage drops especially on the 5V rail. He replaced disk after disk and after ~20 disks the whole array always kicked 3 disks out of the array at the same time --> whole array gone.


    Took him one week to realize the problem: New disks needing slightly more juice on the 5V rail while not being that tolerant wrt voltage drops (the old models were fine with just 4.4V while the newer ones started to fail already at 4.6V)


  • New Sata Controller card:
    SUPERMICRO AOC-SAS2LP-MV8 PCI-Express 2.0 x8 SATA / SAS 8-Port Controller Card



    Old Controller card:
    Adaptec RAID 6805E 2271800-R 6Gb/s SATA/SAS 8 Internal Ports w/ 128MB Cache Memory Controller Card, Kit
    (configured originally with 6 x 4TB configured as JBOD)


    All drives are/were WD REDs.


    First 6 x 4TB WD reds


    now


    6 x 8TB reds



    Here is smartctl from one of the drives




    SAS Cables (2x4sata hydra) on a Silverstone 8 bay hotswap PC case with a backplane.

  • Also took an excerpt from the initial Syslog from the first startup after changing the controller:


    I am not exactly sure how to read it through nor do I see (for me) any obvious issue.


    What other information can I provide?


    Thanks all for helping me.

  • What other information can I provide?

    Log file excerpts aren't useful since filtered. And full SMART info for one drive is also somewhat useless when it's about checking all disks :)


    I already posted two commands above and the below for example would put all SMART info of all of your drives to an online pasteboard service:


    Code
    for disk in /dev/sd? ; do smartctl -q noserial -a $disk ; done | curl -F 'sprunge=<-' http://sprunge.us
    • Offizieller Beitrag

    I'm a bit surprised that no one asked for logs or at least a quick check for cabling/connectiviy problems.

    So, add it to the ever-growing list of things people should provide when their array is missing/degraded - Degraded or missing raid array questions. While those seem like reasonable questions to ask, most people lose their arrays after reboot. Cabling doesn't generally seem like it should be the issue. I guess it could be in this case.


    What about assembling the array without sdb?

    Sure, that might work. Since it is a raid6 array, it will still have one parity drive.

    may be you ran into some very special problems regarding DCB's. Once you used the Adaptec-Card, it has surely written it's DCB (even it was JBOD) on all disks ... normally first 4k on each disk. Then you made the Superblock (so called DCB by mdadm) on top - and after that magic failing upgrade you change the controllercard ... i assume mdadm cannot find his superblock anymore, and it seems that Linux can not find any partition (related) info too.

    I would think that if the raid card was over-writing the superblock on one drive, it would over-write on all of them. If that is the case, there are going to be major issues. As for partition issues, OMV doesn't use partitions with mdadm (if created with OMV). So, it wouldn't care about a partition table even if one still existing from previous use.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • @Dropkick Murphy


    I was thinking about this too, but I can't imagine that sdb is any different than all the other drives in terms of state. It wouldn't make sense to me so I suspect even if I excluded sdb, I'd then get the same error on sdc when trying to reassemble without that drive.



    Log file excerpts aren't useful since filtered. And full SMART info for one drive is also somewhat useless when it's about checking all disks :)
    I already posted two commands above and the below for example would put all SMART info of all of your drives to an online pasteboard service:


    Code
    for disk in /dev/sd? ; do smartctl -q noserial -a $disk ; done | curl -F 'sprunge=<-' http://sprunge.us


    Here is the information asked for the in raid degraded help thread consolidated:



    Here is a repost of my response to what you asked for before:




    Here is the link to the full Smartinfo dump (thanks for letting me know about the site)


    http://sprunge.us/EdgM

  • All fine except your boot disk which is IMO the next failure candidate (check SMART attribute 193, be aware that specs talk about 600,000 and do a web search for 'wdd lcc problem').


    I have 2 of these boot drives and they were both refurbs from NewEgg. I assume that would be factored into whether that parameter is of concern or not? I don't know what the process is when they perform a refurb and whether or not they would reset those values.


    I clone one drive to another and bring the clone offline when I make any significant changes in case of failure, but I appreciate you pointing it out to me. I really should pay more attention to these values.


    So, having said all this, any ideas on where I go from here?

  • I don't know what the process is when they perform a refurb and whether or not they would reset those values.


    Well, resetting this parameter would almost be evil since this can be considered some sort of a health indicator on those WD desktop/mobile drives. If I were you I would try to adjust the behaviour: http://idle3-tools.sourceforge.net


    And usually I'm not that concerned if the LCC value is somewhat high but above 2 million is IMO alarming.


    Unfortunately no ideas how to proceed with your RAID...

    • Offizieller Beitrag

    So, having said all this, any ideas on where I go from here?

    Is there any array assembled in the output of cat /proc/mdstat?


    If not, then try assembling it without sdb:
    mdadm --assemble --force --verbose /dev/md127 /dev/sd[cdefg]

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Is there any array assembled in the output of cat /proc/mdstat?
    If not, then try assembling it without sdb:
    mdadm --assemble --force --verbose /dev/md127 /dev/sd[cdefg]

    • Offizieller Beitrag

    not good. I'm betting it says there is no superblock on the other drives as well. Try:


    mdadm --examine /dev/sd[bcdefg]

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!