OMV 2.0 - Raid 6 (6 x 4TB) FAILED state - Need some serious help

calvin940 · 30. Oktober 2016

So this afternoon I get emails from OMV (which I did not see until 1/2 hour ago) stating that a drive failed, then another email another drive, then another email another drive.. basically got emails identifying 6 drive failures. The email I got just before these emails was an email from OMV telling me that my resource limit had been exceeded.

Anyhow, fast forward to this evening. I see emails, I log on using OMV and ssh. OMV shows me raid md0 Clean, FAILED and the volume no longer shows up in the volume management area.

ssh shows me /dev/md0 is stll mounted on the filesytem and I can see some dirs and files. Seems fine, but I don't really want anything writing to the filesystem at this point so I reboot expecting things to clean themselves up because there is a very low likelihood that all the drives failed at once (and the /dev/md0 was accessible no problem). I don't have any hot or cold spares.

after reboot, raid isn't starting (not surprised) but I now am left with this:

Code

root@CHOMEOMV:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Sep 20 00:13:11 2015
     Raid Level : raid6
  Used Dev Size : -1
   Raid Devices : 6
  Total Devices : 3
    Persistence : Superblock is persistent


    Update Time : Sat Oct 29 14:42:47 2016
          State : active, FAILED, Not Started
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0


         Layout : left-symmetric
     Chunk Size : 512K


           Name : CHOMEOMV:Storage001  (local to host CHOMEOMV)
           UUID : 7ff2875d:f8466166:ed83b1d8:5d486d37
         Events : 1853


    Number   Major   Minor   RaidDevice State
       6       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       0        0        3      removed
       4       0        0        4      removed
       5       0        0        5      removed

Alles anzeigen

No drives are reporting any kind of hardware errors or anything like that and all show up fine in the Intel raid bios screen.

Here is the other information:

Code

root@CHOMEOMV:/etc/mdadm# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : inactive sdf[3](S) sdh[5](S) sdg[4](S)
      11717412864 blocks super 1.2






root@CHOMEOMV:/etc/mdadm# blkid
/dev/sdc: UUID="7ff2875d-f846-6166-ed83-b1d85d486d37" UUID_SUB="caf3e2d2-1fdf-5748-34ab-d7b1ebe203dc" LABEL="CHOMEOMV:Storage001" TYPE="linux_raid_member"
/dev/sdd: UUID="7ff2875d-f846-6166-ed83-b1d85d486d37" UUID_SUB="b42f435f-a0f1-4cf5-8231-381f0dfae572" LABEL="CHOMEOMV:Storage001" TYPE="linux_raid_member"
/dev/sde: UUID="7ff2875d-f846-6166-ed83-b1d85d486d37" UUID_SUB="24939b24-ab9b-e0de-9cd8-a988a84beb17" LABEL="CHOMEOMV:Storage001" TYPE="linux_raid_member"
/dev/sdf: UUID="7ff2875d-f846-6166-ed83-b1d85d486d37" UUID_SUB="89d4d596-ddf9-369e-ddd6-d65ecc50e2e4" LABEL="CHOMEOMV:Storage001" TYPE="linux_raid_member"
/dev/sdb1: UUID="7cab3464-94c9-4a02-bceb-85ec8aa9b523" TYPE="ext4"
/dev/sdb5: UUID="2b207325-2a6e-4608-9f90-1356de34208c" TYPE="swap"
/dev/sda1: UUID="7cab3464-94c9-4a02-bceb-85ec8aa9b523" TYPE="ext4"
/dev/sda5: UUID="2b207325-2a6e-4608-9f90-1356de34208c" TYPE="swap"
/dev/sdg: UUID="7ff2875d-f846-6166-ed83-b1d85d486d37" UUID_SUB="27192ce4-adbb-5ca3-8af6-fe8821b9189c" LABEL="CHOMEOMV:Storage001" TYPE="linux_raid_member"
/dev/sdh: UUID="7ff2875d-f846-6166-ed83-b1d85d486d37" UUID_SUB="3398896a-a781-4ef3-4ed0-c3e7f81730cc" LABEL="CHOMEOMV:Storage001" TYPE="linux_raid_member"










root@CHOMEOMV:/etc/mdadm# fdisk -l


Disk /dev/sdb: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312581808 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0008b16b


   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048   299839487   149918720   83  Linux
/dev/sdb2       299841534   312580095     6369281    5  Extended
/dev/sdb5       299841536   312580095     6369280   82  Linux swap / Solaris


Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312581808 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0008b16b


   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   299839487   149918720   83  Linux
/dev/sda2       299841534   312580095     6369281    5  Extended
/dev/sda5       299841536   312580095     6369280   82  Linux swap / Solaris


Disk /dev/sdc: 3999.7 GB, 3999677808640 bytes
255 heads, 63 sectors/track, 486266 cylinders, total 7811870720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdc doesn't contain a valid partition table


Disk /dev/sdd: 3999.7 GB, 3999677808640 bytes
255 heads, 63 sectors/track, 486266 cylinders, total 7811870720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdd doesn't contain a valid partition table


Disk /dev/sde: 3999.7 GB, 3999677808640 bytes
255 heads, 63 sectors/track, 486266 cylinders, total 7811870720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sde doesn't contain a valid partition table


Disk /dev/sdf: 3999.7 GB, 3999677808640 bytes
255 heads, 63 sectors/track, 486266 cylinders, total 7811870720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdf doesn't contain a valid partition table


Disk /dev/sdg: 3999.7 GB, 3999677808640 bytes
255 heads, 63 sectors/track, 486266 cylinders, total 7811870720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdg doesn't contain a valid partition table


Disk /dev/sdh: 3999.7 GB, 3999677808640 bytes
255 heads, 63 sectors/track, 486266 cylinders, total 7811870720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdh doesn't contain a valid partition table






root@CHOMEOMV:/etc/mdadm# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md127 metadata=1.2 name=CHOMEOMV:Storage001 UUID=7ff2875d:f8466166:ed83b1d8:5d486d37


# instruct the monitoring daemon where to send mail alerts

Alles anzeigen

Question is, where do I go from here? How can I re-add the drives in a way that tells the RAID they are the original drives all good? Because I can't add them in a way where it tries to rebuild because I don't have enough drives for a rebuild.

Help is appreciated.

Cheers

Update #1

I also did this based on an article somewhere and I think that his is also a good sign I could assemble the raid again? I just have no experience with this type of recovery so I need some advice before proceeding:

Code

root@CHOMEOMV:~# mdadm --examine /dev/sd[c-z] | egrep 'Event|/dev/sd'
/dev/sdc:
         Events : 1852
/dev/sdd:
         Events : 1853
/dev/sde:
         Events : 1852
/dev/sdf:
         Events : 1851
/dev/sdg:
         Events : 1851
/dev/sdh:
         Events : 1851

Alles anzeigen

calvin940 · 1. November 2016

wow, nobody has any ideas on how to recover this? Will Assemble not work here?

ryecoaaron · 1. November 2016

Zitat von calvin940

wow, nobody has any ideas on how to recover this?

I seem to be the only one who answers these posts and I am tired of trying to figure out why mdadm raid arrays don't start...

mdadm --stop /dev/md0
mdadm --assemble --verbose --force /dev/md0 /dev/[cdefgh]
update-initramfs -u

calvin940 · 1. November 2016

Well then I must say not only do I appreciate your response in general, but doubly-so due to the fact that mdadm is a source of frustration. I really thought going software raid for me was a better choice than having to rely on a proprietary hardware raid solution in the event of failures. I sure hope that these things don't occur enough for me to be concerned about (or they are easily recovered).

I will be trying this as soon as I get home from work and will post the results.

Thanks Again.

ryecoaaron · 1. November 2016

I used mdadm for years on multiple systems and never had an issue. BUT, I never turn my systems off or spin down the drives or put the system to sleep. I also connected all of my drives via sata. Not sure if you do any of that.

As for hardware raid, as long as you use a common raid card and never cheap motherboard raid, it is better than software raid.

calvin940 · 2. November 2016

I made a DIY NAS. I used the Silverstone DS 8 bay enclosure with a high end ASUS mobo and an Intel sata card x8 and 6x 4TB WD reds. It's my home media machine. Never shutdown , connected to dedicated UPS. As for spin down, I thought I had set that somewhere when it was setup a year ago, but now I cannot be sure nor do I recall how you do it. My NAS is pretty active constantly, but that is not to say it has no dead time so that's possible I guess. that NAS never sleeps either.

Now it is suspicious that a minute before these registered as failed, my box reported a high resource utilization.

Resource limit matched Service localhost

Date: Sat, 29 Oct 2016 14:41:39
Action: alert
Host: CHOMEOMV.local
Description: loadavg(5min) of 4.5 matches resource limit [loadavg(5min)>4.0]

Then the an email for each of the 6 drives in the array with the essentially same email time stamp

Date: Sat, 29 Oct 2016 14:42:48 -0300 (ADT)

This is an automatically generated mail message from mdadm
running on CHOMEOMV

A FailSpare event had been detected on md device /dev/md0.

It could be related to component device /dev/sda.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[6](F) sdf[5](F) sde[4](F) sdd[3](F) sdc[2](F) sdb[1](F)
15623215104 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/0] [______]
unused devices: <none>

calvin940 · 2. November 2016

Zitat von ryecoaaron

I seem to be the only one who answers these posts and I am tired of trying to figure out why mdadm raid arrays don't start...
mdadm --stop /dev/md0
mdadm --assemble --verbose --force /dev/md0 /dev/sd[cdefgh]
update-initramfs -u

Well, I wish I could buy you a <insert your beverage of choice>.

Thanks for taking the time to help me out. I really appreciate it.

Code

root@CHOMEOMV:~# mdadm --assemble --verbose --force /dev/md0 /dev/sd[cdefgh]
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 5.
mdadm: forcing event count in /dev/sdf(3) from 1851 upto 1853
mdadm: forcing event count in /dev/sdg(4) from 1851 upto 1853
mdadm: forcing event count in /dev/sdh(5) from 1851 upto 1853
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdf
mdadm: clearing FAULTY flag for device 4 in /dev/md0 for /dev/sdg
mdadm: clearing FAULTY flag for device 5 in /dev/md0 for /dev/sdh
mdadm: Marking array /dev/md0 as 'clean'
mdadm: added /dev/sdc to /dev/md0 as 0
mdadm: added /dev/sde to /dev/md0 as 2
mdadm: added /dev/sdf to /dev/md0 as 3
mdadm: added /dev/sdg to /dev/md0 as 4
mdadm: added /dev/sdh to /dev/md0 as 5
mdadm: added /dev/sdd to /dev/md0 as 1
mdadm: /dev/md0 has been started with 6 drives.
root@CHOMEOMV:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Sep 20 00:13:11 2015
     Raid Level : raid6
     Array Size : 15623215104 (14899.46 GiB 15998.17 GB)
  Used Dev Size : 3905803776 (3724.86 GiB 3999.54 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent
    Update Time : Tue Nov  1 23:03:42 2016
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
         Layout : left-symmetric
     Chunk Size : 512K
           Name : CHOMEOMV:Storage001  (local to host CHOMEOMV)
           UUID : 7ff2875d:f8466166:ed83b1d8:5d486d37
         Events : 1854
    Number   Major   Minor   RaidDevice State
       6       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8       64        4      active sync   /dev/sde
       5       8       80        5      active sync   /dev/sdf

Alles anzeigen

ryecoaaron · 2. November 2016

Looks like it is working. cat /proc/mdstat will tell you for sure.

calvin940 · 2. November 2016

Zitat von ryecoaaron

Looks like it is working. cat /proc/mdstat will tell you for sure.

Gotcha. This looks good.

Code

root@CHOMEOMV:/tmp# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      15623215104 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]


unused devices: <none>

neoreturn · 2. November 2016

I'll try this on mine too i hope it will help.

Is there any FAQ or something else on MDADM usage ? i haven't found one (at least not user friendly enough) .

ryecoaaron · 2. November 2016

Zitat von neoreturn

Is there any FAQ or something else on MDADM usage ? i haven't found one (at least not user friendly enough) .

https://raid.wiki.kernel.org/index.php/RAID_setup

neoreturn · 2. November 2016

thanks !

OMV 2.0 - Raid 6 (6 x 4TB) FAILED state - Need some serious help

Jetzt mitmachen!

Tags