RAID10 Fails w/ “not enough operational mirrors.” w/ 10 of 12 available.

    • OMV 1.0
    • RAID10 Fails w/ “not enough operational mirrors.” w/ 10 of 12 available.

      RAID10 Fails w/ “not enough operational mirrors.”

      Issue Description:
      Not sure why this happened, but we suspect a power outage, or brown-out. This caused one of our OMV RAID10 arrays to go down, and when we tried a reboot because there were no HDD failures the dmesg log showed one of my iSCSI arrays failing. Also attempts to initiate a re-assemble failed with errors as shown below...

      Our Environment:

      We have a Supermicro High-Density Storage Server w/ two 1Tb SATA SSD drive for OS, and 72 4Tb HDDs in 36 dual hot-swap bays. There are 24 drives per LSI Logic SAS SCSI controler, configured w/ six RAID10 Arrays.

      We are running OpenMediaVault 1.19 (Kralizec).

      The arrays are configured as follows....

      The 2 SSD drives as stand-alone ext4 paritions.

      /dev/sda1 - For the OS
      /dev/sdb1 - For OS storage.

      The RAID Arrays and their associated LUN and types as as follows...

      /dev/md0 - LUN1 - ext4 - NFS Share
      /dev/md1 - LUN2 - ext4 - NFS Share
      /dev/md2 - LUN3 - ext4 - SMB Share
      /dev/md3 - LUN4 - ext4 - SMB Share
      /dev/md4 - LUN5 - ext4 - iSCSI Share (This is my problem Array)
      /dev/md5 - LUN6 - ext4 - iSCSI Share

      ----------------
      Logs and pertinant information.

      1.) Boot-up Dmesg log:

      [ 16.954740] md: md4 stopped.
      ....
      [ 16.960695] md/raid10:md4: not enough operational mirrors.
      [ 16.960775] md: pers->run() failed ...

      2.) mdadm.conf

      :~# cat /etc/mdadm/mdadm.conf
      # mdadm.conf
      .......
      # definitions of existing MD arrays
      ARRAY /dev/md0 metadata=1.2 name=hydromediavault:vol1 UUID=1cfbe551:59608320:d05a6c0b:36472514
      ARRAY /dev/md1 metadata=1.2 name=hydromediavault:vol2 UUID=92786b08:2998971f:e43b629f:9fae9d5c
      ARRAY /dev/md2 metadata=1.2 name=hydromediavault:vol3 UUID=2e027586:e0836061:8c51d19a:25e1de4e
      ARRAY /dev/md3 metadata=1.2 name=hydromediavault:vol4 UUID=7142528a:142b2fcf:1864bd51:ab53d7c1
      ARRAY /dev/md4 metadata=1.2 name=hydromediavault:vol5 UUID=f8964aaf:801f5634:e358c097:1d146306
      ARRAY /dev/md5 metadata=1.2 name=hydromediavault:vol6 UUID=a42c055f:a4c7b1c4:ab83d732:80b2b780

      3.) Stopped the array, then checked disk status...

      :~# smartctl -d scsi -a /dev/sdaw | grep "Status"
      SMART Health Status: OK

      :~# smartctl -d scsi -a /dev/sdax | grep "Status"
      SMART Health Status: OK

      4.) Ran the following to start the array


      :~# mdadm --assemble -v --scan --force --run --uuid=f8964aaf:801f5634:e358c097:1d146306

      ...key results...

      mdadm: failed to RUN_ARRAY /dev/md4: Input/output error
      mdadm: Not enough devices to start the array.

      5.) Results of mdstat...

      :~# cat /proc/mdstat for /dev/md4

      Personalities : [raid10]
      md4 : inactive sdav[2] sdam[11] sdan[10] sdao[9] sdap[8] sdaq[7] sdar[6] sdas[5] sdat[4] sdau[3]
      39068875120 blocks super 1.2


      6.) Results of mdadm examine showing problem with HD devices /dev/sdaw & sdax...

      Drives /dev/sda[mnopqrstuv] - State OK.

      mdadm: No md superblock detected on /dev/sdaw.

      mdadm: No md superblock detected on /dev/sdax.


      7.) Tried to fail both drives then remove then re-add with the following...

      mdadm /dev/md4 --fail /dev/sdaw
      mdadm: set device faulty failed for /dev/sdaw: No such device

      mdadm /dev/md4 --remove /dev/sdaw
      mdadm: hot remove failed for /dev/sdaw: No such device or address

      mdadm /dev/md4 --fail /dev/sdax
      mdadm: set device faulty failed for /dev/sdax: No such device

      mdadm /dev/md4 --remove /dev/sdax
      mdadm: hot remove failed for /dev/sdax: No such device or address

      mdadm --add /dev/md4 /dev/sdaw
      mdadm: /dev/md4 has failed so using --add cannot work and might destroy
      mdadm: data on /dev/sdaw. You should stop the array and re-assemble it.

      mdadm --add /dev/md4 /dev/sdax
      mdadm: /dev/md4 has failed so using --add cannot work and might destroy
      mdadm: data on /dev/sdax. You should stop the array and re-assemble it.

      8.) Second assemble attempt....

      mdadm --assemble --force /dev/md4 /dev/sda[utsrqponmvwx]
      ...
      mdadm: no recogniseable superblock on /dev/sdaw
      mdadm: /dev/sdaw has no superblock - assembly aborted

      9.) Drives show as removed from the array....

      mdadm -D /dev/md4
      /dev/md4:
      Version : 1.2
      Creation Time : Tue Mar 31 13:02:06 2015
      Raid Level : raid10
      Used Dev Size : -1
      Raid Devices : 12
      Total Devices : 10
      Persistence : Superblock is persistent

      Update Time : Thu Jun 4 18:24:15 2015
      State : active, FAILED, Not Started
      Active Devices : 10
      Working Devices : 10
      Failed Devices : 0
      Spare Devices : 0

      Layout : near=2
      Chunk Size : 512K

      Name : hydromediavault:vol5 (local to host hydromediavault)
      UUID : f8964aaf:801f5634:e358c097:1d146306
      Events : 239307

      Number Major Minor RaidDevice State
      0 0 0 0 removed
      1 0 0 1 removed
      2 66 240 2 active sync /dev/sdav
      3 66 224 3 active sync /dev/sdau
      4 66 208 4 active sync /dev/sdat
      5 66 192 5 active sync /dev/sdas
      6 66 176 6 active sync /dev/sdar
      7 66 160 7 active sync /dev/sdaq
      8 66 144 8 active sync /dev/sdap
      9 66 128 9 active sync /dev/sdao
      10 66 112 10 active sync /dev/sdan
      11 66 96 11 active sync /dev/sdam

      THE QUESTIONS:

      #1: If RAID10 arrays can lose up to 2 drives and still be operational, why will a 12 disk RAID10 array not start up w/ 10 good drives? - Is there a way to force this?

      #2: With the 2 drives missing the superblock showing as OK and not failed at hardware level, why can I not fail and remove the drives to re-add them to the array for re-assembly?

      Any insight would be very helpful. I have key data on this array that is otherwise unrecoverable. Please Help!
    • Did you ever stop md4 before assembling: mdadm --stop /dev/md4?
      Do sdaw and sdax show up in fdisk -l and blkid

      You have a system this large with this many drives and no backup??
      omv 5.0.10 usul | 64 bit | 5.0 proxmox kernel | omvextrasorg 5.1.1
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • Hello 'ryecoaaron'

      Thank you for your reply. To answer your questions....

      1. Did I run mdadm --stop /dev/md4? Yes, but the re-assemble would not even show indication of starting without trying to restart the array with a --scan switch to restart the array as shown in the following command results...

      :~# mdadm --assemble -v --scan --force --run --uuid=f8964aaf:801f5634:e358c097:1d146306

      ...key results...

      mdadm: /dev/sdav is identified as a member of /dev/md4, slot 2.
      mdadm: /dev/sdau is identified as a member of /dev/md4, slot 3.
      mdadm: /dev/sdat is identified as a member of /dev/md4, slot 4.
      mdadm: /dev/sdas is identified as a member of /dev/md4, slot 5.
      mdadm: /dev/sdar is identified as a member of /dev/md4, slot 6.
      mdadm: /dev/sdaq is identified as a member of /dev/md4, slot 7.
      mdadm: /dev/sdap is identified as a member of /dev/md4, slot 8.
      mdadm: /dev/sdao is identified as a member of /dev/md4, slot 9.
      mdadm: /dev/sdan is identified as a member of /dev/md4, slot 10.
      mdadm: /dev/sdam is identified as a member of /dev/md4, slot 11.
      mdadm: no uptodate device for slot 0 of /dev/md4
      mdadm: no uptodate device for slot 1 of /dev/md4
      mdadm: added /dev/sdau to /dev/md4 as 3
      mdadm: added /dev/sdat to /dev/md4 as 4
      mdadm: added /dev/sdas to /dev/md4 as 5
      mdadm: added /dev/sdar to /dev/md4 as 6
      mdadm: added /dev/sdaq to /dev/md4 as 7
      mdadm: added /dev/sdap to /dev/md4 as 8
      mdadm: added /dev/sdao to /dev/md4 as 9
      mdadm: added /dev/sdan to /dev/md4 as 10
      mdadm: added /dev/sdam to /dev/md4 as 11
      mdadm: added /dev/sdav to /dev/md4 as 2
      mdadm: failed to RUN_ARRAY /dev/md4: Input/output error
      mdadm: Not enough devices to start the array.


      2. Yes the drive show up under fdisk -l...

      :~# fdisk -l

      ...Key results...

      Disk /dev/sdaw: 4000.8 GB, 4000787030016 bytes
      255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
      Units = sectors of 1 * 512 = 512 bytes
      Sector size (logical/physical): 512 bytes / 512 bytes
      I/O size (minimum/optimal): 512 bytes / 512 bytes
      Disk identifier: 0xb23c4c98

      Disk /dev/sdaw doesn't contain a valid partition table

      Disk /dev/sdax: 4000.8 GB, 4000787030016 bytes
      255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
      Units = sectors of 1 * 512 = 512 bytes
      Sector size (logical/physical): 512 bytes / 512 bytes
      I/O size (minimum/optimal): 512 bytes / 512 bytes
      Disk identifier: 0xc99768eb

      Disk /dev/sdax doesn't contain a valid partition table


      ...But not under the blkid command.

      :~# blkid

      ....both drives before and after the "sdaw" & "sdax" assignment....

      /dev/sdav: UUID="f8964aaf-801f-5634-e358-c0971d146306" UUID_SUB="2566b614-2e8b-d9b7-325f-1ce452f7a7f0" LABEL="hydromediavault:vol5" TYPE="linux_raid_member"
      /dev/sday: UUID="7142528a-142b-2fcf-1864-bd51ab53d7c1" UUID_SUB="90a249cd-942b-fc3b-1cbc-b06e10977137" LABEL="hydromediavault:vol4" TYPE="linux_raid_member"


      3. As far as a backup solution? I get that question as I work a lot with EMC's Avamar product.
      This system however is not being backed up yet as this is a relatively new install, and the client has not purchased a solution for backups just yet. I.e., I have warned them this could happen, due to the storage server being a single point of failure, so this is damage control under a FUBAR situation.

      I do have one interesting command result that looks promising however. Once again, understand I am used to hardware RAID, so am relatively new to MDADM and Linux software RAID. That is the following..

      :~# mdadm --monitor /dev/md4
      mdadm: Warning: One autorebuild process already running.


      ...Could this mean the Array is rebuilding before it comes back online? 8)

      Please any insight would be enormously helpful. Thanks in advance for the assistance here. :thumbup:
    • I never use scan. Try:

      mdadm --stop /dev/md4
      mdadm --assemble /dev/md4 /dev/sda[mnopqrstuvwx] --verbose --force
      omv 5.0.10 usul | 64 bit | 5.0 proxmox kernel | omvextrasorg 5.1.1
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • Hello "Dropkick Murphy"

      The command you mentioned shows the following....

      mdadm -D /dev/md4
      /dev/md4:
      Version : 1.2
      Creation Time : Tue Mar 31 13:02:06 2015
      Raid Level : raid10
      Used Dev Size : -1
      Raid Devices : 12
      Total Devices : 10
      Persistence : Superblock is persistent

      Update Time : Thu Jun 4 18:24:15 2015
      State : active, FAILED, Not Started
      Active Devices : 10
      Working Devices : 10
      Failed Devices : 0
      Spare Devices : 0

      Layout : near=2
      Chunk Size : 512K

      Name : hydromediavault:vol5 (local to host hydromediavault)
      UUID : f8964aaf:801f5634:e358c097:1d146306
      Events : 239307

      Number Major Minor RaidDevice State
      0 0 0 0 removed
      1 0 0 1 removed
      2 66 240 2 active sync /dev/sdav
      3 66 224 3 active sync /dev/sdau
      4 66 208 4 active sync /dev/sdat
      5 66 192 5 active sync /dev/sdas
      6 66 176 6 active sync /dev/sdar
      7 66 160 7 active sync /dev/sdaq
      8 66 144 8 active sync /dev/sdap
      9 66 128 9 active sync /dev/sdao
      10 66 112 10 active sync /dev/sdan
      11 66 96 11 active sync /dev/sdam
    • Hello 'ryecoaaron',

      I've actually tried this very approach. I tried multiple approaches before I finally caved and submitted to this forum. It's like the system is not behaving as expected. I am logged in as 'root' user, but I even tried these commands w/ 'sudo' and still the same results.

      I can stop the array just fine with the '--stop' command, but when I try to start a re-assemble with the array stopped with the following command...

      :~# mdadm --assemble /dev/md4 /dev/sda[mnopqrstuvwx] --verbose --force

      ....It returns an error that the array is currently stopped.

      But when I added scan like in the following.....

      :~# mdadm --assemble /dev/md4 /dev/sda[mnopqrstuvwx] --verbose --force

      .....It will seem to start things up but again, returns the result that I showed in my original post...

      mdadm: failed to RUN_ARRAY /dev/md4: Input/output error
      mdadm: Not enough devices to start the array.


      However, I wonder if I'm being impatient. I see the following when I run the '--monitor' switch....

      :~# mdadm --monitor /dev/md4
      mdadm: Warning: One autorebuild process already running.


      ...Could this mean the Array is rebuilding before it comes back online?
    • Hello All,

      In the interest of following the instruction given in the "Degraded raid array questions" sticky note for the RAID forum, here are the results of the commands requested for down array issues...
      NOTE: Due to post character limitations, I cannot submit the entire results of some of these commands, so only the key results are given...

      1. First - cat /proc/mdstat

      Source Code

      1. root@hostname:~# cat /proc/mdstat
      2. Personalities : [raid10]
      3. md4 : inactive sdav[2] sdam[11] sdan[10] sdao[9] sdap[8] sdaq[7] sdar[6] sdas[5] sdat[4] sdau[3]
      4. 39068875120 blocks super 1.2
      5. md5 : active (auto-read-only) raid10 sdal[0] sdaa[11] sdab[10] sdac[9] sdad[8] sdae[7] sdaf[6] sdag[5] sdah[4] sdai[3] sdaj[2] sdak[1]
      6. 23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      7. md3 : active raid10 sdbj[0] sday[11] sdaz[10] sdba[9] sdbb[8] sdbc[7] sdbd[6] sdbe[5] sdbf[4] sdbg[3] sdbh[2] sdbi[1]
      8. 23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      9. md2 : active raid10 sdbq[0] sdbk[11] sdbl[10] sdbm[9] sdbn[8] sdbo[7] sdbp[6] sdbr[5] sdbs[4] sdbt[3] sdbu[2] sdbv[1]
      10. 23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      11. md1 : active raid10 sdn[0] sdc[11] sdd[10] sde[9] sdf[8] sdg[7] sdh[6] sdi[5] sdj[4] sdk[3] sdl[2] sdm[1]
      12. 23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      13. md0 : active raid10 sdz[0] sdo[11] sdp[10] sdq[9] sdr[8] sds[7] sdt[6] sdu[5] sdv[4] sdw[3] sdx[2] sdy[1]
      14. 23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      Display All


      2. Second - fdisk -l (Including only results for 2 affected disks.)

      Source Code

      1. root@hostname:~# fdisk -l
      2. Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
      3. 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
      4. Units = sectors of 1 * 512 = 512 bytes
      5. Sector size (logical/physical): 512 bytes / 512 bytes
      6. I/O size (minimum/optimal): 512 bytes / 512 bytes
      7. Disk identifier: 0x0003ceba
      8. Device Boot Start End Blocks Id System
      9. /dev/sda1 * 2048 1874483199 937240576 83 Linux
      10. /dev/sda2 1874485246 1953523711 39519233 5 Extended
      11. /dev/sda5 1874485248 1953523711 39519232 82 Linux swap / Solaris
      12. WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.
      13. Disk /dev/sdaw: 4000.8 GB, 4000787030016 bytes
      14. 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
      15. Units = sectors of 1 * 512 = 512 bytes
      16. Sector size (logical/physical): 512 bytes / 512 bytes
      17. I/O size (minimum/optimal): 512 bytes / 512 bytes
      18. Disk identifier: 0xb23c4c98
      19. Disk /dev/sdaw doesn't contain a valid partition table
      20. Disk /dev/sdax: 4000.8 GB, 4000787030016 bytes
      21. 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
      22. Units = sectors of 1 * 512 = 512 bytes
      23. Sector size (logical/physical): 512 bytes / 512 bytes
      24. I/O size (minimum/optimal): 512 bytes / 512 bytes
      25. Disk identifier: 0xc99768eb
      26. Disk /dev/sdax doesn't contain a valid partition table
      Display All


      3. Third - mdadm -D /dev/md4 (To demonstrate the issue at hand)

      Source Code

      1. root@hostname:~# mdadm -D /dev/md4
      2. /dev/md4:
      3. Version : 1.2
      4. Creation Time : Tue Mar 31 13:02:06 2015
      5. Raid Level : raid10
      6. Used Dev Size : -1
      7. Raid Devices : 12
      8. Total Devices : 10
      9. Persistence : Superblock is persistent
      10. Update Time : Thu Jun 4 18:24:15 2015
      11. State : active, FAILED, Not Started
      12. Active Devices : 10
      13. Working Devices : 10
      14. Failed Devices : 0
      15. Spare Devices : 0
      16. Layout : near=2
      17. Chunk Size : 512K
      18. Name : hydromediavault:vol5 (local to host hydromediavault)
      19. UUID : f8964aaf:801f5634:e358c097:1d146306
      20. Events : 239307
      21. Number Major Minor RaidDevice State
      22. [color=#B22222]0 0 0 0 removed
      23. 1 0 0 1 removed[/color]
      24. 2 66 240 2 active sync /dev/sdav
      25. 3 66 224 3 active sync /dev/sdau
      26. 4 66 208 4 active sync /dev/sdat
      27. 5 66 192 5 active sync /dev/sdas
      28. 6 66 176 6 active sync /dev/sdar
      29. 7 66 160 7 active sync /dev/sdaq
      30. 8 66 144 8 active sync /dev/sdap
      31. 9 66 128 9 active sync /dev/sdao
      32. 10 66 112 10 active sync /dev/sdan
      33. 11 66 96 11 active sync /dev/sdam
      Display All


      4. Fourth - blkid (Showing results indicating these 2 drives do not show in the results of this command)

      Source Code

      1. root@hydromediavault:~# blkid
      2. /dev/sdb1: UUID="0cf5d1c8-aca3-4dfd-8f20-da117f20ad26" TYPE="ext4"
      3. /dev/sda1: UUID="27c09dc8-81de-44eb-ae06-da3cd733a157" TYPE="ext4"
      4. /dev/sda5: UUID="329a497a-0808-4837-8552-2c9dc083b85d" TYPE="swap"
      5. /dev/sdav: UUID="f8964aaf-801f-5634-e358-c0971d146306" UUID_SUB="2566b614-2e8b-d9b7-325f-1ce452f7a7f0" LABEL="hydromediavault:vol5" TYPE="linux_raid_member"
      6. [color=#800000][i](Both ./sdaw & ./sdax Drives should be listing here but do not)[/i][/color]
      7. /dev/sday: UUID="7142528a-142b-2fcf-1864-bd51ab53d7c1" UUID_SUB="90a249cd-942b-fc3b-1cbc-b06e10977137" LABEL="hydromediavault:vol4" TYPE="linux_raid_member"
      8. /dev/md0: UUID="RvyJVq-yO2n-U1c7-LoJe-xibi-m3sc-WJlRgi" TYPE="LVM2_member"
      9. /dev/md1: UUID="MQMkME-6gER-2k6y-HSev-NRaN-2Xwn-F4aJ0E" TYPE="LVM2_member"
      10. /dev/md2: UUID="9guLDw-evIw-45XV-omZv-P3K1-o6bd-eQ2CWP" TYPE="LVM2_member"
      11. /dev/md3: UUID="xrE9yG-FpWn-ONY8-cQii-UFK2-pYRQ-Mv1Ttd" TYPE="LVM2_member"
      12. /dev/md5: UUID="QCneDY-SvD0-twcj-RwQa-V6Tu-Ra59-keMnk9" TYPE="LVM2_member"
      13. /dev/mapper/Lun6-Lun6: UUID="9rSdgi-RCze-dS9f-6CTi-hsYO-lz7X-Vb3inU" TYPE="LVM2_member"
      14. /dev/mapper/Lun4-Lun4: LABEL="LUN4" UUID="b5ffa606-a54f-46f3-bfe2-1127406e3c7e" TYPE="ext4"
      15. /dev/mapper/Lun3-Lun3: LABEL="LUN3" UUID="1af8667f-6ac9-41b6-8bdb-5ec97ba897a3" TYPE="ext4"
      16. /dev/mapper/Lun2-Lun2: LABEL="LUN2" UUID="b2d7e2d7-8052-4acb-a2af-e94c51ffcff0" TYPE="ext4"
      17. /dev/mapper/Lun1-Lun1: LABEL="LUN1" UUID="19fd5749-23b3-4b1f-bd33-1f933282bbbc" TYPE="ext4"
      Display All


      5.) As noted in previous post to this thread, the 2 HDD's in question show as status OK by SMART.

      6.) For the past few days now MDADM monitor shows the following results...

      Source Code

      1. root@hostname:~# mdadm --monitor /dev/md4
      2. mdadm: Warning: One autorebuild process already running.


      Could the array be auto-rebuilding?

      I know it took 3 to 4 days for the array to initialize when I created it because it's 12x4TB=23Tb in size.

      Thanks again in advance for any help. (I am not clear on which code box I should be using. The one I chose gave line breaks, where the other did not.)

      The post was edited 3 times, last by lrwaldon ().

    • Thanks for the reply. I tried the mdadm -monitor against one of my good RAIDS and it showed the same result as show.... :(

      Source Code

      1. root@hydromediavault:~# mdadm --monitor /dev/md5
      2. mdadm: Warning: One autorebuild process already running.


      ....Any thoughts on what can I do to get the RAID array up in a crippled state?

      This is It's RAID 10 for an iSCSI array if that is helpful, but I find it hard to understand why 10 out of 12 drives would not be enough to bring the array at least back up.

      Help Please.