Help with system recovery

  • Hi all,


    2 Questions?


    I just want some confirmation I am doing the right thing. My OMV 6 is on an SSD, then I have 2 RAID 0 RAID 1 (edit) arrays, 'media' and 'cctv'. The cctv disks both died, replacing them now with new ones.


    Now not POSTing, disks only list in the BIOS when in AHCI mode, but the list is empty RAID mode (like it was set to before all of this). So no boot and no fixing since in RAID mode it just goes to the EFI Shell when I can't seem to resolve anything.


    I think the raid controller on the mobo is dead. Rest of the machine seems fine (not that I can test much). EFI Shell that boots lists all disks and shows the right number of partitions on each (I think).


    Anything I can do to double check it's just the RAID controller broken?


    Plan:

    Will get a RAID card, a cheapo 4 port one.

    Use the SSD on the Mobo in AHCI mode.

    Plug the 4 disks (2 pairs) into the card.

    Install the OS again (since it was a raid partition setup thing on a single disk before)

    Recover (resync, whatever the steps are) the Media array MOST IMPORTANT THING

    Made new array for new pair of disks

    configure it all like before and have a nice cuppa tea.


    Assuming it's just the RAID controller broken, is my plan correct?


    Just want some confirmation before I pull the trigger.


    Thanks in advance,

    Noki


    I've tested removing all disks but the SSD and tried a bunch of combinations but it's always the same, RAID mode shows no disks in the SATA list, but AHCI mode shows whatever is connected as I would expect.

    Edited once, last by Noki: error in post, clearly marked ().

  • Some possibilities you should consider:


    1. It's possible that your RAID data sets are firmware married to the specific controller on the MB that you suspect has failed. If this is the case, recovering the data is unlikely unless you can identically replace the controller. I have no suggestions as to how to determine whether these possibilities are actualities.


    2. Your plan begins with: "Will get a RAID card, a cheapo 4 port one." I don't have anything to say about this other than don't do it.


    3. RAID failed you. I suggest staying away from it altogether, even if you can recover your data.


    Good luck.

    --
    Google is your friend and Bob's your uncle!


    A backup strategy is worthless unless you have a verified to work by testing restore strategy.


    OMV AMD64 7.x on headless Chenbro NR12000 1U Intel Xeon CPU E3-1230 V2 @ 3.30GHz 32GB ECC RAM.


  • OK, replacements mobo is less than £20 so worth a shot over a RAID card. I plan to test the disk(s) in gparted on another computer before I pull the trigger, see if they are tied to the board. But I don't think they are. Fingers crossed. Just need the time to play more, thanks for the input.

  • You misunderstand.


    The potential problem is not that the disks are tied to the board. The problem can be that the data on the disks is tied to the controller.

    --
    Google is your friend and Bob's your uncle!


    A backup strategy is worthless unless you have a verified to work by testing restore strategy.


    OMV AMD64 7.x on headless Chenbro NR12000 1U Intel Xeon CPU E3-1230 V2 @ 3.30GHz 32GB ECC RAM.


  • Yes, I follow. But from what I am reading online if one of the disks can be mounted on another PC then it will prove that the data is recoverable. So will test that when I have the time.

  • if one of the disks can be mounted on another PC then it will prove that the data is recoverable

    On a RAID0???

    It's either both disks or nothing.


    Consider that when going with RAID0 next time.

    • Official Post

    Sorry, typo in my first post, RAID 1. Mirrored.

    OK, but your first post is somewhat confusing, let me explain;


    I class m'board raid as fake raid, basically it can 'group together' drives within the bios, depending on the chipset, to present a single drive to the OS, or the chipset is presented to the OS as a Raid controller and would require drivers for it to function within the OS.


    Your description in #1 suggests that you created a 'raid' at the bios level, this will write a signature to the drives within that array, no signature, no drives will show up in the raid section. This is simply a BAD IDEA, why, because you have no way of knowing if any drive is failing within an array, hence your comment -> The cctv disks both died What probably occurred was one drive failed leaving the Raid1 in a clean/degraded state, then the other drive failed.


    In #1 the fact that no drives show in the Raid section would suggest the controller/chipset cannot detect any signature on the drives, the norm for bios based arrays offer a second menu option once in the Raid section, so a user would add drives and create the necessary raid level selected.


    AFAIK the bios created array's would have been presented to OMV as a single drives


    At this moment without information about the m'board it should be used in AHCI and NOT RAID this would present each drive to OMV as single drives where you can create a software raid using OMV's Raid Management. This then becomes more manageable within OMV, with SMART setup as well you would be notified of any failing or degrading drives.


    My server is in my sig, this has a sata backplane with a single connection to the m'board, it's default setting is to use Raid, in essence a bios raid, but I have mine set to AHCI so that the drives are presented to OMV as single drives. That way I can setup OMV how I want to use it, single drives, Raid, ZFS, I also have full control over the drives and can setup notifications to ensure it doesn't fall off a cliff :)

    I also boot from a usb flash drive the use of an SSD for boot purposes is overkill, IMHO, SSD is best served for use with docker and docker configs

  • Thanks for the detailed clarification, appreciate the help with this.


    To clarify, the SSD as an OS is an old 80gb intel drive from back in the day, it's power efficient and does the job.


    I may be wrong about running in RAID mode, but that's what the BIOS was set to when I open it after once this problem occurred. The battery seems fine, so that is probably what I set it to. Based on above, that was a mistake I made 2 years ago when i built this machine. I should have used ACHI mode.


    cctv disks: both died separately, one several months ago, the other last week. I was too lazy to fix so left that RAID array on one very aged disk. I figured it it dies all we lose is unimportant cctv footage, I could live without it so waited for this disk to die before forcing me to buy new ones.

    On inspection it's possible one or both of these old disks are ok, i had some detection problems obviously, and so re-seated the sata cables. So maybe one jiggled loose "breaking" one or both disks. Either way they are 150k hours into their life, beyond the pale in hdd terms so needed replacing.


    I still plan on attaching 1 disk (media array) to my desktop pc to see if it can be mounted. Seems like the simplest test of what's what before I figure out what to do long term. Unless There is something else I can do that you would suggest?


    Finally, when booting in ACHI mode it gives a message that no bootable device/os can be found. Normal I guess given the current state of the machine. The EFI shell it boots to in RAID mode does show a list of disks and partitions, so I think there is hope! (possibly maybe)

    Google lens helped me capture this:

    EFI Shell version 2.31 [4.653)


    Current running mode 1.1.2

    Device mapping table

    fso

    :Removable HardDisk Alias hd13a0b biko

    PciRoot (0x0)/Pci(0x11,0x0)/Scsi (0x0, 0x0)/HD (1,GPT, 8a90fd04-b851-404b-9fa7-4587aeec4da1,0x

    800, 0x100000)


    biko :Removable HardDisk Alias hd13a0b fs0

    800, 0x100000)


    biki Removable HardDisk Alias (null)

    100800,0x8559000)


    PciRoot (0x0)/Pci(0x11,0x0)/Scsi (0x0, 0x0)/HD (1,GPT, 8a90fd04-b851-404b-9fa7-4587aeec4da1,0x PciRoot (0x0)/Pci(0x11, 0x0)/Scsi (0x0, 0x0)/HD (2,GPT, 948417a9-f97b-457c-b72e-8051c171bb75,0x PciRoot (0x0)/Pci(0x11, 0x0)/Scsi (0x0, 0x0)/HD (3,GPT, ae29293b-6eba-4faf-bdbb-21ed142077b0,0x PciRoot (0x0)/Pci(0x11, 0x0)/scsi (0x1, 0x0)


    blk2

    :Removable HardDisk Alias (null)

    8659800, 0xeb6000)


    bik3 :Removable BlockDevice Alias (null) Pc Root (0x0)/Pci(0x11, 0x0)/Scsi (0x0, 0x0)

    bik4 :Removable BlockDevice - Alias (null)

    blk5 Removable BlockDevice - Alias (null)

    Pc Root (0x0)/Pci(0x11, 0x0)/Scsi (0x2, 0x0)


    Press ESC in 2 seconds to skip startup.nsh, any other key to continue.


    Shell>


    Again thanks for the interest and input.

    • Official Post

    Finally, when booting in ACHI mode it gives a message that no bootable device/os can be found.

    That in itself suggests that somehow you created an array with the ssd

    I still plan on attaching 1 disk (media array) to my desktop pc to see if it can be mounted

    At the moment you have nothing to lose and I would suggest doing that before attempting anything else, but this is pointing to a reinstall, because I can't see any way of moving forward.


    What's the m'board?

  • OK, so here is my current result:


    BIOS in RAID mode, boot linux mint key, install mdadm and checking stuff.


    I can mount the boot SSD, so I have the root filesystem, in this I found the mdadm.conf file which lists the 2 RAID disks by UUID, and I can see the one disk currently plugged in present under 'disks' with the same UUID. Here it lists the partition as 'Linux RAID Member (version 1.2)'. Good Start.


    mdadm cant do anything with it, says no raid partition/filesystem present. Reboot in AHCI mode, can still mount SSD, and see the 1 RAID HDD under 'disks', same 'Linux RAID Member' label in the disk utility. Attempting `sudo --assemble /dev/sdb` results in both cases gives the message `device exists but is not an md array`.


    Since I have a conf file from the original system so I know it's using mdadm, and the OMV docs support that it is what is used when making an array as i did originally.


    Onto testdisk, for both disks (individually), same result. Pointing to the disk shows a raid array, so I [proceed] then at the next step it is unable to find the partition type and auto-selects 'none'. Should I select the 'efi gpt linux' options (sounds the most reasonable from the list it offers other that 'none', others include intel, mac, xbox and some others I don't recognise.).


    Do I need both disks when using testdisk since it's a mirror array? I am pausing for now, but making progress. I am tempted to try and repair the boot option on the existing OS drive to see if it just comes to life when I do that. Then trying testdisk if/when that fails.


    Feeling less confident, any advice on what I can next would be appreciated.


    Thanks,

    Noki

  • Never mind, I got it to mount with some effort. Thank god, now I just need to reinstall the OS I guess. No actual problem.


    Probably try some kind of boot repair first though. Thanks for the help.


    Will mark as solved if all goes well.

  • Feeling less confident, any advice on what I can next would be appreciated.

    About your use of RAID......are you confident with this, is it really a necessity, or in a single word......why?

    --
    Google is your friend and Bob's your uncle!


    A backup strategy is worthless unless you have a verified to work by testing restore strategy.


    OMV AMD64 7.x on headless Chenbro NR12000 1U Intel Xeon CPU E3-1230 V2 @ 3.30GHz 32GB ECC RAM.


  • Noki

    Added the Label resolved

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!