RAID 6 gone, physical drives visible

ahab666 · 22. August 2015

running spin-rite on each disk right now - disks 0 to 8 are okay ....
slow process though ... like 8~9 hours per disc

i'll report back as soon as i am finished - probably on sunday evening

cheers and thx for your help - ahab666

datadigger · 22. August 2015

Take care that nothing fiddles around with the superblocks while testing. As long as they are untouched the raid informations (And your data) written on the disks by mdadm when the raid was build can be discovered.

Wabun · 22. August 2015

Yeah, hence I suggested only to check the disk which was giving problems.

The point is you need to get the raid back and make asap backup of your most important data!
Well, that's how I would do it. I think there is an option to only scan in Spin-rite, but not sure.

datadigger · 22. August 2015

I always prefer testing programs made by the manufacturer of the disks at the first line, they are tailored for their products.
After reading all the postings I believe that only one disk may have problems.

./edit: This thread should be moved to the /Storage/Raid subforum.

ahab666 · 23. August 2015

@ datadigger :

Code

root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : inactive sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      32231490632 blocks super 1.2

how can i reactivate the inactive hdds ? and how can i add sd?[7] to the reactivated raid ???

okay- i followed your advices and just tested the probably defective one sd?[7] - and no nothing found with the wd-tool as well as with spinrite as well as with HDDU ....

so any other telnet or gui commands that can help me find and reuse my data again ???

cheers - alex

Wabun · 23. August 2015

@ahab666

Hi Alex,

try this:
logout of your gui.
open a ssh and enter these two commands:

Code

mdadm --assemble /dev/md127 /dev/sd[bcdefghijklm] --verbose --force 
update-initramfs -u

After that provide output of:

Code

cat /proc/mdstat 
cat /etc/fstab

Then log back in gui and tell me the status of your raid.

Alex, regards disk 7, I think you really need to zero-wipe that disk with DBAN as I recommended previously, with a SMART check before and verify after each sector write then when done and no errors turn up your lucky and disk is fine, why? If there is something wrong with the raid information on that disk [Super-block or whatsoever] the disk will be ignored it doesn't matter how often you take it out and put back, it is marked bad! A zero-wipe will let the raid believe it is a new disk and there is nothing in the way to rebuild the raid again.

Assuming DBAN found no errors then at least you know that the disk is fine and your Super-block was damaged.

So that are 2 different things, I really hope you understand this?

- Bad disk with damaged sectors or cluster.
- Bad or damaged Super-block and the raid will not accept that disk!

ahab666 · 23. August 2015

@Wabun ...

Code

root@OMV:~# mdadm --assemble /dev/md127 /dev/sd[bcdefghijklm] --verbose --force
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb is busy - skipping
mdadm: /dev/sdc is busy - skipping
mdadm: /dev/sdd is busy - skipping
mdadm: /dev/sde is busy - skipping
mdadm: /dev/sdf is busy - skipping
mdadm: /dev/sdg is busy - skipping
mdadm: /dev/sdh is busy - skipping
mdadm: /dev/sdi is busy - skipping
mdadm: /dev/sdj is busy - skipping
mdadm: /dev/sdk is busy - skipping
mdadm: /dev/sdl is busy - skipping
mdadm: /dev/sdm is busy - skipping
root@OMV:~#

Alles anzeigen

and

Code

root@OMV:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-3.2.0-4-amd64
mdadm: cannot open /dev/md/OMV: No such file or directory
mdadm: cannot open /dev/md/OMV: No such file or directory

and

Code

root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md126 : inactive sdm[13](S)
      2930135512 blocks super 1.2


md127 : inactive sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      32231490632 blocks super 1.2


unused devices: <none>

Alles anzeigen

and

Code

root@OMV:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda1 during installation
UUID=6ac66484-42b3-48ad-8430-072852de03ab /               ext4    errors=remount-ro 0       1
# swap was on /dev/sda5 during installation
UUID=6edd6d3a-11b9-4188-848c-0c2f2c9a73fa none            swap    sw              0       0
/dev/sdb1       /media/usb0     auto    rw,user,noauto  0       0
# >>> [openmediavault]
# <<< [openmediavault]

Alles anzeigen

md 126 looks wiered

looks like harddisks number 7 and 8 are missing in the "old" md127 array , hdd seven was inactive and hdd8 became a member of the md126 array - one i never built ...

cheers - alex

Wabun · 23. August 2015

@ahab666

Where is the output of update-initramfs -u

Anyway make sure you do that command and reboot the server.

Then login GUI and tell me the status of the raid

datadigger · 23. August 2015

No need to start initramfs before the raid is complete. When mdadm sees all the disks and the raid is complete, it automatically starts a rebuild. The initramfs command adds it to the boot image.

@ahab: Stop that raid 126 with mdadm --stop /dev/md126, that kills the raid126 and frees the disk 8. Then add it manually to the raid 127 with mdadm --manage /dev/md127 --add /dev/sdi (Correct /dev/sd<letter> is important!).
Then do the same with disk No. 7: mdadm --manage /dev/md127 --add /dev/sdj - when I read your posts right this shoud be sdj.
If mdadm can read the disk correctly and it is ok then mdadm will start rebuilding, have a look at the web-ui.
Then you can run the initramfs command as wabun suggested. If initramfs finds an error it will tell that.

ahab666 · 23. August 2015

added the update-initrramfs -u in the previous message - sorry oversaw that....

rebooted, the server with shutdown -r now and opened the gui - :-\ the raid window is still empty

@datadigger :

Code

root@OMV:~# mdadm --stop /dev/md126
mdadm: stopped /dev/md126
root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdj
mdadm: Cannot open /dev/sdj: Device or resource busy
root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdi
mdadm: Cannot open /dev/sdi: Device or resource busy

well ....

Wabun · 23. August 2015

@datadigger

You think he might have swapped disk 7 and 8 from their physical location on raid?
Just wonder why both are not recognised and been assigned to the /dev/md126

ahab666 · 23. August 2015

@Wabun et al :

i did not touch the HDDS . i took out my controller card and always connected only one HDD to the mainboard to do the testing and the reconnect it zo the controller (NO hardware raid !!!)
the hdds are still connected as they were since i use OMV .....

Wabun · 23. August 2015

@ahab666
Try a reboot and see if you then can assign both disk as datadigger told you.

datadigger · 24. August 2015

Until these two disks are not a part of the raid you can restart the box as long as you want, that won't bring it back.
These two disks have "lost the race" while the raid was assembled, udev can avoid the completion. I would start over building the raid from scratch.
At first check if these two disks responds:

smartctl -a /dev/sdi and smartctl -a /dev/sdj

to make sure that they are well-connected.

Then start over:
mdadm --stop /dev/md127 (This raid definition should now be removed from mdadm.conf)
udevadm control --stop-exec-queue
mdadm --assemble /dev/md127 /dev/sd[bcdefghijklm] --verbose --force

(If these two disks are still missing try to add them manually as stated above.
mdadm --manage /dev/md127 --add /dev/sdi
mdadm --manage /dev/md127 --add /dev/sdj)

If the raid is complete start udev:
udevadm control --start-exec-queue

Now check if the raid was build correctly:
cat /proc/mdstat
mdadm --detail --scan

If mdadm starts to rebuild, run initramfs and look for errors. If the raid was named correctly in mdadm.conf then it shouldn't spit out any error.

Just fought the same battle last weekend when I moved a raid from an old machine to a new installation, udevadm did the trick.

ahab666 · 24. August 2015

@Wabun :

Code

root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdi
mdadm: Cannot open /dev/sdi: Device or resource busy
root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdj
mdadm: Cannot open /dev/sdj: Device or resource busy
root@OMV:~#

sorry - same as before ;-/

i will reinstall OMV agaian an check if there is any difference ...

Wabun · 24. August 2015

@datadigger

Yeah to get the drives out of their status from busy you need to stop the service.

Code

udevadm control --stop-exec-queue
....
udevadm control --start-exec-queue

http://dev.bizo.com/2012/07/md…ice-or-resource-busy.html

datadigger · 24. August 2015

Zitat von ahab666
@Wabun :
Code
root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdi
mdadm: Cannot open /dev/sdi: Device or resource busy
root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdj
mdadm: Cannot open /dev/sdj: Device or resource busy
root@OMV:~#
sorry - same as before ;-/
i will reinstall OMV agaian an check if there is any difference ...

That may lead to the same situation. Now we have to check why these two disks cannot be added to the raid.
Give the result of blkid. After all these actions to get the raid back they possibly belong to another raid definition (Like disk 8 to md126...). blkid will tell if this is the case.

Wabun · 24. August 2015

@datadigger

He needs to stop the service and assign the drives back, mdadm assigns just a random number starting with 127 downwards. so the drives don't belong to anything yet. In the worse case scenario the Superblock is damaged, I think he really should try to stop the service and try to assign the drives, what you think?

ahab666 · 24. August 2015

@datadigger et al

Code

root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md126 : inactive sdm[13](S)
      2930135512 blocks super 1.2


md127 : inactive sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      32231490632 blocks super 1.2


unused devices: <none>
root@OMV:~# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
root@OMV:~# mdadm --stop /dev/md126
mdadm: stopped /dev/md126
root@OMV:~# udevadm control --stop-exec-queue
root@OMV:~# mdadm --assemble /dev/md127 /dev/sd[bcdefghijklm] --verbose --force
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdd is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 5.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 6.
mdadm: /dev/sdi is identified as a member of /dev/md127, slot 8.
mdadm: /dev/sdj is identified as a member of /dev/md127, slot 9.
mdadm: /dev/sdk is identified as a member of /dev/md127, slot 10.
mdadm: /dev/sdl is identified as a member of /dev/md127, slot 11.
mdadm: /dev/sdm is identified as a member of /dev/md127, slot 7.
mdadm: Marking array /dev/md127 as 'clean'
mdadm: added /dev/sdc to /dev/md127 as 1
mdadm: added /dev/sdd to /dev/md127 as 2
mdadm: added /dev/sde to /dev/md127 as 3
mdadm: added /dev/sdf to /dev/md127 as 4
mdadm: added /dev/sdg to /dev/md127 as 5
mdadm: added /dev/sdh to /dev/md127 as 6
mdadm: added /dev/sdm to /dev/md127 as 7 (possibly out of date)
mdadm: added /dev/sdi to /dev/md127 as 8
mdadm: added /dev/sdj to /dev/md127 as 9
mdadm: added /dev/sdk to /dev/md127 as 10
mdadm: added /dev/sdl to /dev/md127 as 11
mdadm: added /dev/sdb to /dev/md127 as 0
mdadm: /dev/md127 has been started with 11 drives (out of 12).
root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdm
mdadm: added /dev/sdm
root@OMV:~# udevadm control --start-exec-queue
root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdm[13] sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      29301350400 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUUUU_UUUU]
      [>....................]  recovery =  0.0% (2027164/2930135040) finish=747.5min speed=65278K/sec


unused devices: <none>
root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdm[13] sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      29301350400 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUUUU_UUUU]
      [>....................]  recovery =  0.1% (3801924/2930135040) finish=695.9min speed=70080K/sec


unused devices: <none>
root@OMV:~# mdadm --detail --scan
ARRAY /dev/md127 metadata=1.2 spares=1 name=OMV:OMV UUID=6230a09b:2bd2f0af:b6f72e19:46e3b8b3
root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdm[13] sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      29301350400 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUUUU_UUUU]
      [>....................]  recovery =  0.1% (3912888/2930135040) finish=26375.9min speed=1848K/sec


unused devices: <none>
root@OMV:~# initramfs
-bash: initramfs: Kommando nicht gefunden.
root@OMV:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdm[13] sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      29301350400 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUUUU_UUUU]
      [>....................]  recovery =  0.1% (5077276/2930135040) finish=2324.7min speed=20970K/sec


unused devices: <none>

Alles anzeigen

and

from the GUI

Code

Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent


    Update Time : Mon Aug 24 12:55:27 2015
          State : clean, degraded
 Active Devices : 11
Working Devices : 11
 Failed Devices : 1
  Spare Devices : 0


         Layout : left-symmetric
     Chunk Size : 512K


           Name : OMV:OMV  (local to host OMV)
           UUID : 6230a09b:2bd2f0af:b6f72e19:46e3b8b3
         Events : 41082


    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       3       8       64        3      active sync   /dev/sde
       4       8       80        4      active sync   /dev/sdf
       5       8       96        5      active sync   /dev/sdg
       6       8      112        6      active sync   /dev/sdh
       7       0        0        7      removed
      12       8      128        8      active sync   /dev/sdi
       9       8      144        9      active sync   /dev/sdj
      10       8      160       10      active sync   /dev/sdk
      11       8      176       11      active sync   /dev/sdl


      13       8      192        -      faulty spare   /dev/sdm

Alles anzeigen

will take some time - i guess - lets wait and see - cheers

Wabun · 24. August 2015

@ahab666

Alex the command was: update-initramfs -u

root@OMV:~# initramfs
-bash: initramfs: Kommando nicht gefunden.

Edit: Let the raid do the work, don't touch it
I noticed it is the same disk again which failed, I hope the rebuild will fix it.
When the raid is rebuild, but before you do a reboot, you have to run the command : update-initramfs -u

Jetzt mitmachen!