Motherboard failure - one of 2 RAID mirrors not initializing after new MOB

  • Hi there


    I recently had a complete motherboard failure. I don't believe that the main disk was corrupted. Now with a new motherboard (using the same OS disk i.e. not a re-install of OMV) I am struggling to get the second array displaying in File Systems or anywhere really.
    My setup included 2 x mirror arrays each of 2 disks. All four disks are visible in the "Storage - Physical Disks" tab. There are errors to this effect when OMV starts up i.e. "A start job is waiting ..." but the job times out after 1m30s. It's worth nothing I even struggled to get the system to boot when I plugged in the first set of mirror drives, but when I switched the Sata cables around it then seemed to start up OK. Not sure if it's related or there's some kind of sata port mapping gone wonky.


    If anyone is able to advise me how I can get the array re-initialized (without any loss of data on those disks) that would be great thank you.


    Hopefully I've captured all the log entries as outlined in the "Thread/8631-Degraded-or-missing-raid-array-questions" topic. They are


    cat /proc/mdstat
    Personalities : [raid1]
    md0 : active raid1 sde[0] sdd[1]
    3906887488 blocks super 1.2 [2/2] [UU]
    bitmap: 0/30 pages [0KB], 65536KB chunk


    unused devices: <none>


    root@asgard:~# blkid
    /dev/sda1: UUID="cfe21927-c043-4f06-98e7-641af3bfd0e3" TYPE="ext4" PARTUUID="57752150-01"
    /dev/sda5: UUID="1416f894-572d-42b2-8835-7a3f78dc2010" TYPE="swap" PARTUUID="57752150-05"
    /dev/sdd: UUID="30bd86d8-3fc5-daba-458e-9d36b4834fb3" UUID_SUB="49a6adf2-7bde-9429-2702-a14c545682c0" LABEL="asgard:MediaMirror" TYPE="linux_raid_member"
    /dev/md0: LABEL="Mirror" UUID="74a256f1-c671-4b01-b7b4-0684a674a1fe" TYPE="ext4"
    /dev/sde: UUID="30bd86d8-3fc5-daba-458e-9d36b4834fb3" UUID_SUB="dec7141b-d383-9245-c91a-979ee14b2117" LABEL="asgard:MediaMirror" TYPE="linux_raid_member"
    /dev/sdb1: PARTUUID="d1752d5c-82c2-49b9-814e-fe18e8e92feb"
    /dev/sdc1: PARTUUID="a412027d-950c-4bd0-8909-b52905e44eee"


    root@asgard:~# fdisk -l | grep "Disk "
    Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
    Disk identifier: 0x57752150
    Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
    Disk identifier: 75C252BC-CBCA-48B7-927A-8B6A866A98B0
    Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
    Disk identifier: 390BFEE5-8B82-441C-BFB8-7A044FCEE9BE
    Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/sde: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk /dev/md0: 3.7 TiB, 4000652787712 bytes, 7813774976 sectors


    root@asgard:~# cat /etc/mdadm/mdadm.conf
    # mdadm.conf
    #
    # Please refer to mdadm.conf(5) for information about this file.
    #


    # by default, scan all partitions (/proc/partitions) for MD superblocks.
    # alternatively, specify devices to scan, using wildcards if desired.
    # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
    # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
    # used if no RAID devices are configured.
    DEVICE partitions


    # auto-create devices with Debian standard permissions
    CREATE owner=root group=disk mode=0660 auto=yes


    # automatically tag new arrays as belonging to the local system
    HOMEHOST <system>


    # definitions of existing MD arrays
    ARRAY /dev/md0 metadata=1.2 name=asgard:MediaMirror UUID=30bd86d8:3fc5daba:458e9d36:b4834fb3
    ARRAY /dev/md1 metadata=1.2 name=asgard:Mirror2 UUID=f9a793d6:75212b31:b8d3b243:6bdcb505


    root@asgard:~# mdadm --detail --scan --verbose
    ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 name=asgard:MediaMirror UUID=30bd86d8:3fc5daba:458e9d36:b4834fb3
    devices=/dev/sdd,/dev/sde
    root@asgard:~#


    I have seen this recent post but I thought that situation may be slightly different so didn't follow that resolution (Server died trying to recover)


    Any advice appreciated.
    Regards
    Phil

    • Offizieller Beitrag

    I'll try and explain as best I can what I know and what I understand but you may need someone with better knowledge.


    Your mdadm.conf file shows 2 arrays /dev/md0 and dev/md1, /dev/md0 is running with 2 drives /dev/sdd and /dev/sde


    The job that times out is the system trying to locate the second raid /dev/md1 which it can't the reason being the 2 drives associated with that array (/dev/sdb and /dev/sdc) have been renamed hence this reading from blkid

    Code
    /dev/sdb1: PARTUUID="d1752d5c-82c2-49b9-814e-fe18e8e92feb"
    /dev/sdc1: PARTUUID="a412027d-950c-4bd0-8909-b52905e44eee"

    My understanding of this is that the system 'sees' these drives (hence the PARTUUID) but the boot process sees then in a different order therefore it has assigned them /dev/sdb1 and /dev/sdc1.
    This is the part I am not sure of, so I'll give a shot in the dark, change over their connections on the m'board, my other understanding is that onboard raid controllers will initialise drives in a certain order (might be wrong) so this could be the reason why the drives have been renamed during the boot process.


    I could be suggesting a load of rubbish, but it's the only place I can think of to start.

  • Hi geaves,


    Thanks very much for your reply. It makes sense what you're saying. I tried multiple times, shutdown, shuffle SATA cables, start up - however sadly now I think I've made it more angry/worse. My mirror that was working is now showing as "clean/degraded", and the second mirror is still nowhere to be seen.


    Any other thoughts you've got I'd appreciate to hear them.


    Thanks
    Phil

    • Offizieller Beitrag

    Any other thoughts you've got I'd appreciate to hear them.

    This seems more like a needle in haystack....to recover any raid is has to bee unmounted and then stopped, however, you can't do this under ssh omv, to unmount a raid you have to remove all references....all shares, so delete the share but not the content, you don't want to do this. So in omv-extras under the kernel tab there is an option to install SystemRescueCD that is what you may have to use.


    Do you have all your data backed before the m'board change? If you have the simple process would be start again PIA, but the lesser of two evils. Then write a hundred lines " I must back up my data" :)


    There are 2 things you will need to start with, take a screen shot of Storage>>>Disks this will give you the device id, model and serial number. Reboot and watch the boot process at the point the drives are being loaded hit pause on the keyboard this will halt the boot process, take a picture, then pause again to continue, this will list each drive in the order they are being loaded.


    Using the above you should be able to work out what is connected where, consult the m'board manual regarding the sata ports and the process of boot order.


    Sorry I understand what you've posted but I think omv is simply screwed because it can't locate the drives in the correct order when your raids were set up.

  • Hi geaves


    Thanks for your reply, apologies for my late response but it's been a busy week :)


    I will do as you suggest i.e. trying to work out device id's and boot in that order. Still doesn't feel like one should have to do this to be honest i.e. In my opinion devices should be able to be plugged in whichever SATA port and that should not matter. To be honest this is the second time something has gone wrong for me with OMV since I migrated from FreeNas (which was an effort in itself!) and my confidence in it is falling rapidly. It's a great product until something goes wrong, then it seems like one is always just screwed as you put it :)


    I will try and post an update with what happens next.


    Thanks again for taking the time to help
    Phil

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!