RAID 6 Array (6x8TB) device Disappeared

  • Once you used the Adaptec-Card, it has surely written it's DCB (even it was JBOD) on all disks ... normally first 4k on each disk. Then you made the Superblock (so called DCB by mdadm) on top - and after that magic failing upgrade you change the controllercard ...

    Is it possible that the Adaptec changed disk geometry (possibly hiding sectors from the beginning of the disk since used 'internally')? Just asking since I'm currently fighting against a stupid USB-to-SATA bridge which does the same but from the disk's end (so it doesn't hurt that much, only backup GPT corrupted)?


    @calvin940 in case your syslogs date back from before the controller change it might help to provide output from

    Code
    zgrep 'LBA48 NCQ ' /var/log/syslog* | curl -F 'sprunge=<-' http://sprunge.us
  • not good. I'm betting it says there is no superblock on the other drives as well. Try:


    mdadm --examine /dev/sd[bcdefg]


    • Offizieller Beitrag

    That isn't good. This is what it should look like:

    Sorry, I really don't have any ideas now. Maybe connect the drives back to the old raid card??

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Sorry, I really don't have any ideas now. Maybe connect the drives back to the old raid card??

    The problem I had originally was that after these shenanigans, when I rebooted, the drives were not showing up at all in Debian. Apparently the adaptec doesn't simply present a new disk as a drive. You need to configure them either in a raid or JBOD and then it presents it to the OS. hotswapping the drives seems to have cause the issue. when I went into the Adaptec Controller config (CTRL-A), it saw the raw devices, but when I went to manage JBOD , it said no JBODs found. Creating JBODS initializes the disk so that would mean wiping the drives (also not productive).


    So... uhm, what about a really far out idea?


    Given that I replaced the drives as I stated (one by one) and rebuilt each new drive, what about using the original drives? I have 5 of the 6 original 4TB drives that I swapped out. Could I put the adaptec back and slide those 5 in and try to re-assemble the raid based on those, get a 6th one in there to make it back to my original 6x4tb and then start the whole process over again? There has been changes to the filesystem since (ie. disparity between event numbers on each drive for sure), but what is the likelihood I could get data back?

    • Offizieller Beitrag

    I honestly have no idea what state the removed drives will be in. Never replaced working drives in an array. If you can get the old array working, I would create a new array with the 8tb disks on the new controller and rsync the files from old to new.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Is it possible that the Adaptec changed disk geometry (possibly hiding sectors from the beginning of the disk since used 'internally')? Just asking since I'm currently fighting against a stupid USB-to-SATA bridge which does the same but from the disk's end (so it doesn't hurt that much, only backup GPT corrupted)?
    @calvin940 in case your syslogs date back from before the controller change it might help to provide output from

    Code
    zgrep 'LBA48 NCQ ' /var/log/syslog* | curl -F 'sprunge=<-' http://sprunge.us


    http://sprunge.us/gOjj


    Not a lot there unfortunately. I don't reboot my NAS much .

  • sprunge.us/gOjj -- Not a lot there unfortunately

    Yep, but all disks show up with 15628053168 sectors in size. So in case you attach them back to the Adaptec please check the log and update the thread (or even better record output from 'gdisk -l /dev/sdb' with the Marvell controller now and the Adaptec then).

  • Yep, but all disks show up with 15628053168 sectors in size. So in case you attach them back to the Adaptec please check the log and update the thread (or even better record output from 'gdisk -l /dev/sdb' with the Marvell controller now and the Adaptec then).

    Attaching the 8TB drives back to the Adaptec will not give me anything as the drives aren't being exposed to the debian OS. When I rebooted my NAS they were in the adaptec post messages but not passed through to the OS (debian only saw my boot drive and 2 USB drives). It did not see the 8TB disks because they would need to be initialized as JBOD and thus get erased.


    Damn adaptec and it's bullshit changing the drives to present them to an OS rather than simply passing them as generic discrete disks like a normal controller. The whole point in moving from a QNAP to a DIY NAS/RAID system was to try to remove proprietary crap for safety and recovery. And I ordered adaptec only because they had a JBOD option and I *THOUGHT* that it was straight forward. Once again, Adaptec has screwed me over (it did so a number of years ago and against my better judgement, I chose it again). *sigh*.


    I think my only option is to try to recover my raid is to power down, take the 8TB out, put Adaptec back in, put 5 of the 6 of the original 4TB drives back in and try to recover my raid that way. If I can, then I will back up the contents to 2 of the 8 TB drives and then power down, replace adaptec with the new Supermicro and build my OMV box anew from scratch (and raid) with 3.x.


    I think I'll start that tonight or tomorrow unless I hear any other hail-Mary ideas...

  • So, add it to the ever-growing list of things people should provide when their array is missing/degraded - Degraded or missing raid array questions.

    Hehe, I totally forgot that I can edit posts :)


    Anyway: I would like to follow another path we recently talked about. More logging and a tool to ease submitting such debug info (in a somewhat similar fashion that armbianmonitor combined with armhwinfo on the ARM devices already do).


    In Armbian the armhwinfo service gets called at startup and shutdown, adjusts this and that (mostly performance related) but also does some logging (startup logging and shutdown logging). When problems occur, we ask to provide output from armbianmonitor -u which essentially just uploads latest armhwinfo.log (logrotation prevents this becoming too large) with some additional added information.


    Since you said most lost arrays occur after a reboot I would believe if we transfer such an attempt to OMV it would be useful to also split this logging:

    • A service similar to armhwinfo called let's say omvhwinfo gets called at startup and shutdown and adds information to /var/log/omvhwinfo.log
    • The same service knows also a 'diskinfo' call and gets executed nightly from /etc/cron.daily/ but will always check last /var/log/omvhwinfo.log modification date and will only add to the log if 7 days passed by (idea: weekly logging even if users shut their OMV box down in between, maybe cron.weekly can already deal with that?)
    • Similar to 'armbianmonitor -u' a simple script called eg. omv-diag will be provided that tries to upload /var/log/omvhwinfo.log to sprunge.us with some additional info


    Idea is to collect relevant information on every boot (disk and disk controller information, mount information, disk usage, dmesg) and on shutdown (free and iostat output but most importantly last 200 dmesg lines to catch everything that might have gone wrong prior to the reboot/shutdown). When asked to execute omv-diag users will submit this log + all you ask for in Degraded or missing raid array questions + last 200 dmesg lines).


    While this won't help users recovering from an array that has gone it might allow us to at least get an idea what has happened before to act on accordingly (eg. spread warnings about exchanging RAID with normal multi-port SATA/SAS controllers). I also think it's important to ease submitting such info as much as we can. And collecting some relevant disk information over time might help anyway.


    Wrt the latter: I re-used stuff I already had on my disk and came up with this disk query routine:

    Uses GUID to identify disks, cares only about block devices, does not wake sleeping drives (so no worries that this executed as cronjob scares users why their disks wake up at night :) ) but tries to collect as much 'health info' as possible (all the SMART attributes in 190-199 range are interesting). Example output from my RAID-Z running on Armada 385 (Clearfog Pro):



    What do you think? IMO it would be necessary to add this to OMV or as a mandatory extra since otherwise all those people needing these capabilities won't have it installed for sure ;)

    • Offizieller Beitrag

    What do you think?

    I could add it to omv-extras but that doesn't help with people who don't have omv-extras installed. I think it would be worth asking Volker @votdev to include it in core OMV. If he says no, then I will add it to omv-extras.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

    • Offizieller Beitrag

    I could add it to omv-extras but that doesn't help with people who don't have omv-extras installed. I think it would be worth asking Volker @votdev to include it in core OMV. If he says no, then I will add it to omv-extras.

    Please summarize the issue in a feature request or here. It's difficult to follow the chat to get all informations.
    @tkaiser By the way, why not using Python instead of shell scripts? :)

  • Please summarize the issue in a feature request or here.

    I'll meditate on this two more days, then send a feature request. Bash and not Python since I realized a decade ago that I should only do prototyping/scripting instead of real coding since in every customer project so far the coders asked two or at least one question that I would've never thought about since already 'too much into'. :)


    One absurd side effect: If you need someone experienced with both AppleScript and Unix (and bridging both worlds without breaking things ;) )


    Just kidding, I'll prepare a feature request soon and focus on the support side of things. Since this was really some sort of an eye-opener with Armbian to have a simple tool that allows users to provide information not tampered with interpretation :)

  • This does sound like an excellent plan. I would love to be able to provide something valuable and consistent when I get into these types of situation instead of fumbling around with piecemeal adhoc queries /commands.


    I don't know where I will get in my current situation but if I have to rebuild my raid from scratch I will be implementing this the next time around for sure.


    I'll let you know how plan B goes (but really I think the probability is not on my side). If it doesn't work, I'll just wipe everything and begin anew with OMV 3.x I Guess and start again (OR... SHOULD I GO 4.x??)

  • Re,

    Is it possible that the Adaptec changed disk geometry (possibly hiding sectors from the beginning of the disk since used 'internally')?

    Yeah. HW-RAID-Controllers normally working on pure LBA, which makes easy to "hide" sectors (beginning and/or end, and for multiple partition layouts in between too) :(


    Just asking since I'm currently fighting against a stupid USB-to-SATA bridge which does the same but from the disk's end (so it doesn't hurt that much, only backup GPT corrupted)?

    Yeah, same thing ...


    Sc0rp

  • So, I did what I said I was going to do:


    • Powered down NAS
    • Yanked all 6 x 8TB drives
    • Pulled the Supermicro 8 port SATA controller and put back the Adaptec
    • Put back into the bays 5 of the 6 original 4TB drives
    • Booted into the OS. Raid / Volume not available
    • Assembled the raid with 5 of 6 disks
    • Rebooted to mount the voume
    • Got access to the file system but there were errors and fsck did not complete on boot up and was asking me to go into maintenance mode to check.
    • Decided to umount and remount as readonly
    • I Am now copying as much data as I can from the raid (some file system errors occasionally in places and certain files aren't recoverable).


    So, was I right to not try to correct the file system using fsck? i was worried that writing to the array would result in potentially mucking everything up so I am just trying to salvage as much as possible in ReadOnly mode 1st pass.


    When reassembling there were a bunch of event differences in the drives but it seemed to assemble at least in this state.


    So I am doing some targetted rsyncing to external USB drives to recover what I can.


    Then I will format everything and install 4.0.1 and build a full 6 x 8TB raid 6 from scratch.


    Should I be doing anything else?


    Thanks a lot for all of your collective help. It was/is very much appreciated. I am hopeful at least for partial recovery where I had felt before all hope was lost.

    • Offizieller Beitrag

    was I right to not try to correct the file system using fsck? i was worried that writing to the array would result in potentially mucking everything up so I am just trying to salvage as much as possible in ReadOnly mode 1st pass.

    I would probably do the same. The readonly 1st pass seems safe. I would definitely try an fsck after that.


    When reassembling there were a bunch of event differences in the drives but it seemed to assemble at least in this state.

    That isn't surprising since each drive was removed from the previous array at different times. I'm actually surprised it worked.


    Then I will format everything and install 4.0.1 and build a full 6 x 8TB raid 6 from scratch.


    Should I be doing anything else?

    I would consider not using software raid :)
    I would really think about whether you need raid at all.
    I would try figure out a way to backup your data as well.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Definitely the plan to do the fsck after first pass. I am shocked it worked as well. I'll take what I can get and then go from there.


    I moved to software raid specifically so that I wouldn't be tied into one vendor's proprietary hardware. I moved completely away from a QNAP box which is mostly proprietary to a Linux system and software raid so basically all the equipment can be replaced with newer/non specific hardware to rebuild a linux box and get access to my raid again. I feel I got screwed over again by Adaptec's proprietary nature with that geometry change. Had I used a straight 8 port sata controller (like the supermicro I just bought) I would have been laughing. Everything I had done was proper and worked perfectly. I would not have been in this situation if it wasn't for that Adaptec raid card controller not giving me a JBOD without putting a bunch of shit in there to make it their own.


    I like software raid. I like MD. It's hardware agnostic. That is really important to me. It has always been my friend. All the problems I have experienced over time have not been due to software raid, but rather the hardware.


    With this final change to the supermicro card, I should have now effectively removed the last piece of the proprietary puzzle.

  • Ohm ... where do i get v4.0.x (we talk about OMV?).

    AFAIK currently only by adding it manually to a Stretch installation. Differences to OMV 3 installation procedure are fortunately minimal (replace 'erasmus' with 'arrakis' when creating apt source list and grab/install https://github.com/OpenMediaVa…extrasorg_latest_all4.deb then). Even all the customization stuff for OMV3 works currently the same with 4 too :)

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!