Motherboard failure - one of 2 RAID mirrors not initializing after new MOB

    • OMV 3.x
    • Motherboard failure - one of 2 RAID mirrors not initializing after new MOB

      Hi there

      I recently had a complete motherboard failure. I don't believe that the main disk was corrupted. Now with a new motherboard (using the same OS disk i.e. not a re-install of OMV) I am struggling to get the second array displaying in File Systems or anywhere really.
      My setup included 2 x mirror arrays each of 2 disks. All four disks are visible in the "Storage - Physical Disks" tab. There are errors to this effect when OMV starts up i.e. "A start job is waiting ..." but the job times out after 1m30s. It's worth nothing I even struggled to get the system to boot when I plugged in the first set of mirror drives, but when I switched the Sata cables around it then seemed to start up OK. Not sure if it's related or there's some kind of sata port mapping gone wonky.

      If anyone is able to advise me how I can get the array re-initialized (without any loss of data on those disks) that would be great thank you.

      Hopefully I've captured all the log entries as outlined in the "Thread/8631-Degraded-or-missing-raid-array-questions" topic. They are

      cat /proc/mdstat
      Personalities : [raid1]
      md0 : active raid1 sde[0] sdd[1]
      3906887488 blocks super 1.2 [2/2] [UU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

      unused devices: <none>

      root@asgard:~# blkid
      /dev/sda1: UUID="cfe21927-c043-4f06-98e7-641af3bfd0e3" TYPE="ext4" PARTUUID="57752150-01"
      /dev/sda5: UUID="1416f894-572d-42b2-8835-7a3f78dc2010" TYPE="swap" PARTUUID="57752150-05"
      /dev/sdd: UUID="30bd86d8-3fc5-daba-458e-9d36b4834fb3" UUID_SUB="49a6adf2-7bde-9429-2702-a14c545682c0" LABEL="asgard:MediaMirror" TYPE="linux_raid_member"
      /dev/md0: LABEL="Mirror" UUID="74a256f1-c671-4b01-b7b4-0684a674a1fe" TYPE="ext4"
      /dev/sde: UUID="30bd86d8-3fc5-daba-458e-9d36b4834fb3" UUID_SUB="dec7141b-d383-9245-c91a-979ee14b2117" LABEL="asgard:MediaMirror" TYPE="linux_raid_member"
      /dev/sdb1: PARTUUID="d1752d5c-82c2-49b9-814e-fe18e8e92feb"
      /dev/sdc1: PARTUUID="a412027d-950c-4bd0-8909-b52905e44eee"

      root@asgard:~# fdisk -l | grep "Disk "
      Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
      Disk identifier: 0x57752150
      Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
      Disk identifier: 75C252BC-CBCA-48B7-927A-8B6A866A98B0
      Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
      Disk identifier: 390BFEE5-8B82-441C-BFB8-7A044FCEE9BE
      Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
      Disk /dev/sde: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
      Disk /dev/md0: 3.7 TiB, 4000652787712 bytes, 7813774976 sectors

      root@asgard:~# cat /etc/mdadm/mdadm.conf
      # mdadm.conf
      #
      # Please refer to mdadm.conf(5) for information about this file.
      #

      # by default, scan all partitions (/proc/partitions) for MD superblocks.
      # alternatively, specify devices to scan, using wildcards if desired.
      # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      # used if no RAID devices are configured.
      DEVICE partitions

      # auto-create devices with Debian standard permissions
      CREATE owner=root group=disk mode=0660 auto=yes

      # automatically tag new arrays as belonging to the local system
      HOMEHOST <system>

      # definitions of existing MD arrays
      ARRAY /dev/md0 metadata=1.2 name=asgard:MediaMirror UUID=30bd86d8:3fc5daba:458e9d36:b4834fb3
      ARRAY /dev/md1 metadata=1.2 name=asgard:Mirror2 UUID=f9a793d6:75212b31:b8d3b243:6bdcb505

      root@asgard:~# mdadm --detail --scan --verbose
      ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 name=asgard:MediaMirror UUID=30bd86d8:3fc5daba:458e9d36:b4834fb3
      devices=/dev/sdd,/dev/sde
      root@asgard:~#

      I have seen this recent post but I thought that situation may be slightly different so didn't follow that resolution (Server died trying to recover)

      Any advice appreciated.
      Regards
      Phil
    • I'll try and explain as best I can what I know and what I understand but you may need someone with better knowledge.

      Your mdadm.conf file shows 2 arrays /dev/md0 and dev/md1, /dev/md0 is running with 2 drives /dev/sdd and /dev/sde

      The job that times out is the system trying to locate the second raid /dev/md1 which it can't the reason being the 2 drives associated with that array (/dev/sdb and /dev/sdc) have been renamed hence this reading from blkid

      Source Code

      1. /dev/sdb1: PARTUUID="d1752d5c-82c2-49b9-814e-fe18e8e92feb"
      2. /dev/sdc1: PARTUUID="a412027d-950c-4bd0-8909-b52905e44eee"
      My understanding of this is that the system 'sees' these drives (hence the PARTUUID) but the boot process sees then in a different order therefore it has assigned them /dev/sdb1 and /dev/sdc1.
      This is the part I am not sure of, so I'll give a shot in the dark, change over their connections on the m'board, my other understanding is that onboard raid controllers will initialise drives in a certain order (might be wrong) so this could be the reason why the drives have been renamed during the boot process.

      I could be suggesting a load of rubbish, but it's the only place I can think of to start.
      Raid is not a backup! Would you go skydiving without a parachute?
    • Hi geaves,

      Thanks very much for your reply. It makes sense what you're saying. I tried multiple times, shutdown, shuffle SATA cables, start up - however sadly now I think I've made it more angry/worse. My mirror that was working is now showing as "clean/degraded", and the second mirror is still nowhere to be seen.

      Any other thoughts you've got I'd appreciate to hear them.

      Thanks
      Phil
    • Phil wrote:

      Any other thoughts you've got I'd appreciate to hear them.
      This seems more like a needle in haystack....to recover any raid is has to bee unmounted and then stopped, however, you can't do this under ssh omv, to unmount a raid you have to remove all references....all shares, so delete the share but not the content, you don't want to do this. So in omv-extras under the kernel tab there is an option to install SystemRescueCD that is what you may have to use.

      Do you have all your data backed before the m'board change? If you have the simple process would be start again PIA, but the lesser of two evils. Then write a hundred lines " I must back up my data" :)

      There are 2 things you will need to start with, take a screen shot of Storage>>>Disks this will give you the device id, model and serial number. Reboot and watch the boot process at the point the drives are being loaded hit pause on the keyboard this will halt the boot process, take a picture, then pause again to continue, this will list each drive in the order they are being loaded.

      Using the above you should be able to work out what is connected where, consult the m'board manual regarding the sata ports and the process of boot order.

      Sorry I understand what you've posted but I think omv is simply screwed because it can't locate the drives in the correct order when your raids were set up.
      Raid is not a backup! Would you go skydiving without a parachute?
    • Hi geaves

      Thanks for your reply, apologies for my late response but it's been a busy week :)

      I will do as you suggest i.e. trying to work out device id's and boot in that order. Still doesn't feel like one should have to do this to be honest i.e. In my opinion devices should be able to be plugged in whichever SATA port and that should not matter. To be honest this is the second time something has gone wrong for me with OMV since I migrated from FreeNas (which was an effort in itself!) and my confidence in it is falling rapidly. It's a great product until something goes wrong, then it seems like one is always just screwed as you put it :)

      I will try and post an update with what happens next.

      Thanks again for taking the time to help
      Phil
    • Users Online 1

      1 Guest