Beiträge von calvin940

    I am now back up and running. I am using OMV 4.0.5 and remade my raid 6 using md and the 6x8TB drives using the pure JBOD 8 port sata card with no shenanigans.


    Things are running pretty smoothly. Must say I am enjoying OMV on top of stretch and re-installing all my stuff was a breeze.


    Thanks again everyone for all your input/help. Very much appreciated.

    So, after recovery efforts completed, I managed to lose only about 1TB worth of data from pretty much a full capacity 16TB array. Pretty much any recovery was miraculous but getting all that back probably meant I wasn't a douchebag in my previous life :)


    So, here I am , OMV4.0.5 and wanting to create my MD RAID 6 , 6x8TB raid array.


    Can someone please point me to resources on how I should configure this before I start? specifically I am looking for advice on any controller settings I should be changing / tweaking or drive settings I should be making (write-through, write-back , etc). When I created my other raid, I am not sure I paid attention to any of that stuff so I want to do this understanding more about the process and settings. Also, can I make those changes after the raid is built or must I do these things before hand?


    Also, last time I used the OMV interface to make the raid and then do the volume configuration in that same interface. Does it make sense to continue with that approach this time around?


    Thanks again to all of you for all the help you have provided thus far. It was invaluable and I am back on track. In the end, I don't mind this "refresh" as I get to use new versions of software.

    Definitely the plan to do the fsck after first pass. I am shocked it worked as well. I'll take what I can get and then go from there.


    I moved to software raid specifically so that I wouldn't be tied into one vendor's proprietary hardware. I moved completely away from a QNAP box which is mostly proprietary to a Linux system and software raid so basically all the equipment can be replaced with newer/non specific hardware to rebuild a linux box and get access to my raid again. I feel I got screwed over again by Adaptec's proprietary nature with that geometry change. Had I used a straight 8 port sata controller (like the supermicro I just bought) I would have been laughing. Everything I had done was proper and worked perfectly. I would not have been in this situation if it wasn't for that Adaptec raid card controller not giving me a JBOD without putting a bunch of shit in there to make it their own.


    I like software raid. I like MD. It's hardware agnostic. That is really important to me. It has always been my friend. All the problems I have experienced over time have not been due to software raid, but rather the hardware.


    With this final change to the supermicro card, I should have now effectively removed the last piece of the proprietary puzzle.

    So, I did what I said I was going to do:


    • Powered down NAS
    • Yanked all 6 x 8TB drives
    • Pulled the Supermicro 8 port SATA controller and put back the Adaptec
    • Put back into the bays 5 of the 6 original 4TB drives
    • Booted into the OS. Raid / Volume not available
    • Assembled the raid with 5 of 6 disks
    • Rebooted to mount the voume
    • Got access to the file system but there were errors and fsck did not complete on boot up and was asking me to go into maintenance mode to check.
    • Decided to umount and remount as readonly
    • I Am now copying as much data as I can from the raid (some file system errors occasionally in places and certain files aren't recoverable).


    So, was I right to not try to correct the file system using fsck? i was worried that writing to the array would result in potentially mucking everything up so I am just trying to salvage as much as possible in ReadOnly mode 1st pass.


    When reassembling there were a bunch of event differences in the drives but it seemed to assemble at least in this state.


    So I am doing some targetted rsyncing to external USB drives to recover what I can.


    Then I will format everything and install 4.0.1 and build a full 6 x 8TB raid 6 from scratch.


    Should I be doing anything else?


    Thanks a lot for all of your collective help. It was/is very much appreciated. I am hopeful at least for partial recovery where I had felt before all hope was lost.

    This does sound like an excellent plan. I would love to be able to provide something valuable and consistent when I get into these types of situation instead of fumbling around with piecemeal adhoc queries /commands.


    I don't know where I will get in my current situation but if I have to rebuild my raid from scratch I will be implementing this the next time around for sure.


    I'll let you know how plan B goes (but really I think the probability is not on my side). If it doesn't work, I'll just wipe everything and begin anew with OMV 3.x I Guess and start again (OR... SHOULD I GO 4.x??)

    Yep, but all disks show up with 15628053168 sectors in size. So in case you attach them back to the Adaptec please check the log and update the thread (or even better record output from 'gdisk -l /dev/sdb' with the Marvell controller now and the Adaptec then).

    Attaching the 8TB drives back to the Adaptec will not give me anything as the drives aren't being exposed to the debian OS. When I rebooted my NAS they were in the adaptec post messages but not passed through to the OS (debian only saw my boot drive and 2 USB drives). It did not see the 8TB disks because they would need to be initialized as JBOD and thus get erased.


    Damn adaptec and it's bullshit changing the drives to present them to an OS rather than simply passing them as generic discrete disks like a normal controller. The whole point in moving from a QNAP to a DIY NAS/RAID system was to try to remove proprietary crap for safety and recovery. And I ordered adaptec only because they had a JBOD option and I *THOUGHT* that it was straight forward. Once again, Adaptec has screwed me over (it did so a number of years ago and against my better judgement, I chose it again). *sigh*.


    I think my only option is to try to recover my raid is to power down, take the 8TB out, put Adaptec back in, put 5 of the 6 of the original 4TB drives back in and try to recover my raid that way. If I can, then I will back up the contents to 2 of the 8 TB drives and then power down, replace adaptec with the new Supermicro and build my OMV box anew from scratch (and raid) with 3.x.


    I think I'll start that tonight or tomorrow unless I hear any other hail-Mary ideas...

    Is it possible that the Adaptec changed disk geometry (possibly hiding sectors from the beginning of the disk since used 'internally')? Just asking since I'm currently fighting against a stupid USB-to-SATA bridge which does the same but from the disk's end (so it doesn't hurt that much, only backup GPT corrupted)?
    @calvin940 in case your syslogs date back from before the controller change it might help to provide output from

    Code
    zgrep 'LBA48 NCQ ' /var/log/syslog* | curl -F 'sprunge=<-' http://sprunge.us


    http://sprunge.us/gOjj


    Not a lot there unfortunately. I don't reboot my NAS much .

    Sorry, I really don't have any ideas now. Maybe connect the drives back to the old raid card??

    The problem I had originally was that after these shenanigans, when I rebooted, the drives were not showing up at all in Debian. Apparently the adaptec doesn't simply present a new disk as a drive. You need to configure them either in a raid or JBOD and then it presents it to the OS. hotswapping the drives seems to have cause the issue. when I went into the Adaptec Controller config (CTRL-A), it saw the raw devices, but when I went to manage JBOD , it said no JBODs found. Creating JBODS initializes the disk so that would mean wiping the drives (also not productive).


    So... uhm, what about a really far out idea?


    Given that I replaced the drives as I stated (one by one) and rebuilt each new drive, what about using the original drives? I have 5 of the 6 original 4TB drives that I swapped out. Could I put the adaptec back and slide those 5 in and try to re-assemble the raid based on those, get a 6th one in there to make it back to my original 6x4tb and then start the whole process over again? There has been changes to the filesystem since (ie. disparity between event numbers on each drive for sure), but what is the likelihood I could get data back?

    not good. I'm betting it says there is no superblock on the other drives as well. Try:


    mdadm --examine /dev/sd[bcdefg]


    Is there any array assembled in the output of cat /proc/mdstat?
    If not, then try assembling it without sdb:
    mdadm --assemble --force --verbose /dev/md127 /dev/sd[cdefg]

    All fine except your boot disk which is IMO the next failure candidate (check SMART attribute 193, be aware that specs talk about 600,000 and do a web search for 'wdd lcc problem').


    I have 2 of these boot drives and they were both refurbs from NewEgg. I assume that would be factored into whether that parameter is of concern or not? I don't know what the process is when they perform a refurb and whether or not they would reset those values.


    I clone one drive to another and bring the clone offline when I make any significant changes in case of failure, but I appreciate you pointing it out to me. I really should pay more attention to these values.


    So, having said all this, any ideas on where I go from here?

    @Dropkick Murphy


    I was thinking about this too, but I can't imagine that sdb is any different than all the other drives in terms of state. It wouldn't make sense to me so I suspect even if I excluded sdb, I'd then get the same error on sdc when trying to reassemble without that drive.



    Log file excerpts aren't useful since filtered. And full SMART info for one drive is also somewhat useless when it's about checking all disks :)
    I already posted two commands above and the below for example would put all SMART info of all of your drives to an online pasteboard service:


    Code
    for disk in /dev/sd? ; do smartctl -q noserial -a $disk ; done | curl -F 'sprunge=<-' http://sprunge.us


    Here is the information asked for the in raid degraded help thread consolidated:



    Here is a repost of my response to what you asked for before:




    Here is the link to the full Smartinfo dump (thanks for letting me know about the site)


    http://sprunge.us/EdgM

    Also took an excerpt from the initial Syslog from the first startup after changing the controller:


    I am not exactly sure how to read it through nor do I see (for me) any obvious issue.


    What other information can I provide?


    Thanks all for helping me.


    New Sata Controller card:
    SUPERMICRO AOC-SAS2LP-MV8 PCI-Express 2.0 x8 SATA / SAS 8-Port Controller Card



    Old Controller card:
    Adaptec RAID 6805E 2271800-R 6Gb/s SATA/SAS 8 Internal Ports w/ 128MB Cache Memory Controller Card, Kit
    (configured originally with 6 x 4TB configured as JBOD)


    All drives are/were WD REDs.


    First 6 x 4TB WD reds


    now


    6 x 8TB reds



    Here is smartctl from one of the drives




    SAS Cables (2x4sata hydra) on a Silverstone 8 bay hotswap PC case with a backplane.

    I am not afraid of taking my time. I can have patience for a long process if the outcome has a higher probability of success. But from the vibe I get from you it would seem that it is my only option and that there is not a good success rate. And likely that action is not recoverable either. Fair assessment?

    OMV 2.1


    had 6 x 4TB drives.


    Adaptec controller configured disks as JBOD


    Raid running fine.


    Then one by one, replaced disks with 8TB - never rebooting NAS in between.


    Grew raid, lvm, etc until had to grow filesystem. Found out resize2fs with OMV2.1 (Wheezy) has older resize2fs that doesn't support raid larger than 16TB.


    I downloaded Live debian 9 on USB with newer core to use resize2fs on my volume. I restart NAS, miss the boot option it goes back into OMV. However, raid is missing now. /dev/sd[b-g] no longer in dev list.


    I reboot again and hit control-A and no JBOD showing up any longer. it looks like adaptec actually does something to disks to make them JBOD.


    So I buy a pure 8 port SATA card. replace the adaptec with this SUpermicro card. It only does JBOD. Now when I boot up OMV, I see the drives /dev/sd[b-g], however syslog still says:


    Sep 19 20:08:34 CHOMEOMV anacron[2496]: Anacron 2.3 started on 2017-09-19
    Sep 19 20:08:35 CHOMEOMV mdadm[2533]: DeviceDisappeared event detected on md device /dev/md127



    What do I do now? Can I force assemble the raid array?


    Any help would be appreciated.

    so i tired the command yesterday (the --level=5 was not supported on the assemble command btw, so i skipped this one).


    it did not work (devices were all busy)
    .
    .
    .

    I had this initially as well. This happened because my removed drives somehow got assigned to a different raid (my original raid was /dev/md0 but after the failures, /dev/sdf, /dev/sdg, /dev/sdh were all assigned to some phantom raid /dev/md127 - could see this with the cat /proc/mdstat).


    So first I stopped /dev/md127 to make them not busy. Then I executed the assemble command and that worked.


    Adding /dev/sda in the way you did I believe caused the issue you are seeing.


    I hate to make a recommendation here without having as much knowledge the other good folks, but I think you should:


    cat /proc/mdstat


    and stop all raids you see there (md127, md128 or md0 or whatever)


    Then I think you should assemble you raid using the disks that reported to up UP previously. That looks like this:


    mdadm --assemble /dev/md127 /dev/sd[bcd] --verbose --force


    I am excluding the sda because I think that it is a problem so the plan would be to assemble based on the drives that appeared to be fine. I'd focus first on getting the critical data off before you attempt to get a recovered array.


    I would caution you that I don't have as much experience and knowledge as others so you might want to wait for one of those folks to help.