RAID 6 gone, physical drives visible

  • running spin-rite on each disk right now - disks 0 to 8 are okay ....
    slow process though ... like 8~9 hours per disc


    i'll report back as soon as i am finished - probably on sunday evening


    cheers and thx for your help - ahab666

  • Take care that nothing fiddles around with the superblocks while testing. As long as they are untouched the raid informations (And your data) written on the disks by mdadm when the raid was build can be discovered.

    Homebox: Bitfenix Prodigy Case, ASUS E45M1-I DELUXE ITX, 8GB RAM, 5x 4TB HGST Raid-5 Data, 1x 320GB 2,5" WD Bootdrive via eSATA from the backside
    Companybox 1: Standard Midi-Tower, Intel S3420 MoBo, Xeon 3450 CPU, 16GB RAM, 5x 2TB Seagate Data, 1x 80GB Samsung Bootdrive - testing for iSCSI to ESXi-Hosts
    Companybox 2: 19" Rackservercase 4HE, Intel S975XBX2 MoBo, C2D@2200MHz, 8GB RAM, HP P212 Raidcontroller, 4x 1TB WD Raid-0 Data, 80GB Samsung Bootdrive, Intel 1000Pro DualPort (Bonded in a VLAN) - Temp-NFS-storage for ESXi-Hosts

  • Yeah, hence I suggested only to check the disk which was giving problems.


    The point is you need to get the raid back and make asap backup of your most important data!
    Well, that's how I would do it. I think there is an option to only scan in Spin-rite, but not sure.

  • I always prefer testing programs made by the manufacturer of the disks at the first line, they are tailored for their products.
    After reading all the postings I believe that only one disk may have problems.


    ./edit: This thread should be moved to the /Storage/Raid subforum.

    Homebox: Bitfenix Prodigy Case, ASUS E45M1-I DELUXE ITX, 8GB RAM, 5x 4TB HGST Raid-5 Data, 1x 320GB 2,5" WD Bootdrive via eSATA from the backside
    Companybox 1: Standard Midi-Tower, Intel S3420 MoBo, Xeon 3450 CPU, 16GB RAM, 5x 2TB Seagate Data, 1x 80GB Samsung Bootdrive - testing for iSCSI to ESXi-Hosts
    Companybox 2: 19" Rackservercase 4HE, Intel S975XBX2 MoBo, C2D@2200MHz, 8GB RAM, HP P212 Raidcontroller, 4x 1TB WD Raid-0 Data, 80GB Samsung Bootdrive, Intel 1000Pro DualPort (Bonded in a VLAN) - Temp-NFS-storage for ESXi-Hosts

  • @ datadigger :


    Code
    root@OMV:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md127 : inactive sdb[0] sdl[11] sdk[10] sdj[9] sdi[12] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
    32231490632 blocks super 1.2


    how can i reactivate the inactive hdds ? and how can i add sd?[7] to the reactivated raid ???


    okay- i followed your advices and just tested the probably defective one sd?[7] - and no nothing found with the wd-tool as well as with spinrite as well as with HDDU ....


    so any other telnet or gui commands that can help me find and reuse my data again ???


    cheers - alex

  • @ahab666


    Hi Alex,


    try this:
    logout of your gui.
    open a ssh and enter these two commands:


    Code
    mdadm --assemble /dev/md127 /dev/sd[bcdefghijklm] --verbose --force
    update-initramfs -u


    After that provide output of:


    Code
    cat /proc/mdstat
    cat /etc/fstab


    Then log back in gui and tell me the status of your raid.


    Alex, regards disk 7, I think you really need to zero-wipe that disk with DBAN as I recommended previously, with a SMART check before and verify after each sector write then when done and no errors turn up your lucky and disk is fine, why? If there is something wrong with the raid information on that disk [Super-block or whatsoever] the disk will be ignored it doesn't matter how often you take it out and put back, it is marked bad! A zero-wipe will let the raid believe it is a new disk and there is nothing in the way to rebuild the raid again.


    Assuming DBAN found no errors then at least you know that the disk is fine and your Super-block was damaged.


    So that are 2 different things, I really hope you understand this?


    - Bad disk with damaged sectors or cluster.
    - Bad or damaged Super-block and the raid will not accept that disk!

  • @Wabun ...



    and


    Code
    root@OMV:~# update-initramfs -u
    update-initramfs: Generating /boot/initrd.img-3.2.0-4-amd64
    mdadm: cannot open /dev/md/OMV: No such file or directory
    mdadm: cannot open /dev/md/OMV: No such file or directory


    and



    and



    md 126 looks wiered


    looks like harddisks number 7 and 8 are missing in the "old" md127 array , hdd seven was inactive and hdd8 became a member of the md126 array - one i never built ...


    cheers - alex

  • No need to start initramfs before the raid is complete. When mdadm sees all the disks and the raid is complete, it automatically starts a rebuild. The initramfs command adds it to the boot image.


    @ahab: Stop that raid 126 with mdadm --stop /dev/md126, that kills the raid126 and frees the disk 8. Then add it manually to the raid 127 with mdadm --manage /dev/md127 --add /dev/sdi (Correct /dev/sd<letter> is important!).
    Then do the same with disk No. 7: mdadm --manage /dev/md127 --add /dev/sdj - when I read your posts right this shoud be sdj.
    If mdadm can read the disk correctly and it is ok then mdadm will start rebuilding, have a look at the web-ui.
    Then you can run the initramfs command as wabun suggested. If initramfs finds an error it will tell that.

    Homebox: Bitfenix Prodigy Case, ASUS E45M1-I DELUXE ITX, 8GB RAM, 5x 4TB HGST Raid-5 Data, 1x 320GB 2,5" WD Bootdrive via eSATA from the backside
    Companybox 1: Standard Midi-Tower, Intel S3420 MoBo, Xeon 3450 CPU, 16GB RAM, 5x 2TB Seagate Data, 1x 80GB Samsung Bootdrive - testing for iSCSI to ESXi-Hosts
    Companybox 2: 19" Rackservercase 4HE, Intel S975XBX2 MoBo, C2D@2200MHz, 8GB RAM, HP P212 Raidcontroller, 4x 1TB WD Raid-0 Data, 80GB Samsung Bootdrive, Intel 1000Pro DualPort (Bonded in a VLAN) - Temp-NFS-storage for ESXi-Hosts

  • added the update-initrramfs -u in the previous message - sorry oversaw that....


    rebooted, the server with shutdown -r now and opened the gui - :-\ the raid window is still empty


    @datadigger :


    Code
    root@OMV:~# mdadm --stop /dev/md126
    mdadm: stopped /dev/md126
    root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdj
    mdadm: Cannot open /dev/sdj: Device or resource busy
    root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdi
    mdadm: Cannot open /dev/sdi: Device or resource busy


    well ....

  • @Wabun et al :


    i did not touch the HDDS . i took out my controller card and always connected only one HDD to the mainboard to do the testing and the reconnect it zo the controller (NO hardware raid !!!)
    the hdds are still connected as they were since i use OMV .....

  • Until these two disks are not a part of the raid you can restart the box as long as you want, that won't bring it back.
    These two disks have "lost the race" while the raid was assembled, udev can avoid the completion. I would start over building the raid from scratch.
    At first check if these two disks responds:


    smartctl -a /dev/sdi and smartctl -a /dev/sdj

    to make sure that they are well-connected.

    Then start over:
    mdadm --stop /dev/md127 (This raid definition should now be removed from mdadm.conf)
    udevadm control --stop-exec-queue
    mdadm --assemble /dev/md127 /dev/sd[bcdefghijklm] --verbose --force


    (If these two disks are still missing try to add them manually as stated above.
    mdadm --manage /dev/md127 --add /dev/sdi
    mdadm --manage /dev/md127 --add /dev/sdj)


    If the raid is complete start udev:
    udevadm control --start-exec-queue


    Now check if the raid was build correctly:
    cat /proc/mdstat
    mdadm --detail --scan


    If mdadm starts to rebuild, run initramfs and look for errors. If the raid was named correctly in mdadm.conf then it shouldn't spit out any error.


    Just fought the same battle last weekend when I moved a raid from an old machine to a new installation, udevadm did the trick.

    Homebox: Bitfenix Prodigy Case, ASUS E45M1-I DELUXE ITX, 8GB RAM, 5x 4TB HGST Raid-5 Data, 1x 320GB 2,5" WD Bootdrive via eSATA from the backside
    Companybox 1: Standard Midi-Tower, Intel S3420 MoBo, Xeon 3450 CPU, 16GB RAM, 5x 2TB Seagate Data, 1x 80GB Samsung Bootdrive - testing for iSCSI to ESXi-Hosts
    Companybox 2: 19" Rackservercase 4HE, Intel S975XBX2 MoBo, C2D@2200MHz, 8GB RAM, HP P212 Raidcontroller, 4x 1TB WD Raid-0 Data, 80GB Samsung Bootdrive, Intel 1000Pro DualPort (Bonded in a VLAN) - Temp-NFS-storage for ESXi-Hosts

  • @Wabun :


    Code
    root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdi
    mdadm: Cannot open /dev/sdi: Device or resource busy
    root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdj
    mdadm: Cannot open /dev/sdj: Device or resource busy
    root@OMV:~#


    sorry - same as before ;-/


    i will reinstall OMV agaian an check if there is any difference ...

  • @Wabun :


    Code
    root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdi
    mdadm: Cannot open /dev/sdi: Device or resource busy
    root@OMV:~# mdadm --manage /dev/md127 --add /dev/sdj
    mdadm: Cannot open /dev/sdj: Device or resource busy
    root@OMV:~#


    sorry - same as before ;-/
    i will reinstall OMV agaian an check if there is any difference ...


    That may lead to the same situation. Now we have to check why these two disks cannot be added to the raid.
    Give the result of blkid. After all these actions to get the raid back they possibly belong to another raid definition (Like disk 8 to md126...). blkid will tell if this is the case.

    Homebox: Bitfenix Prodigy Case, ASUS E45M1-I DELUXE ITX, 8GB RAM, 5x 4TB HGST Raid-5 Data, 1x 320GB 2,5" WD Bootdrive via eSATA from the backside
    Companybox 1: Standard Midi-Tower, Intel S3420 MoBo, Xeon 3450 CPU, 16GB RAM, 5x 2TB Seagate Data, 1x 80GB Samsung Bootdrive - testing for iSCSI to ESXi-Hosts
    Companybox 2: 19" Rackservercase 4HE, Intel S975XBX2 MoBo, C2D@2200MHz, 8GB RAM, HP P212 Raidcontroller, 4x 1TB WD Raid-0 Data, 80GB Samsung Bootdrive, Intel 1000Pro DualPort (Bonded in a VLAN) - Temp-NFS-storage for ESXi-Hosts

  • @datadigger


    He needs to stop the service and assign the drives back, mdadm assigns just a random number starting with 127 downwards. so the drives don't belong to anything yet. In the worse case scenario the Superblock is damaged, I think he really should try to stop the service and try to assign the drives, what you think?

  • @datadigger et al



    and


    from the GUI






    will take some time - i guess - lets wait and see - cheers

  • @ahab666


    Alex the command was: update-initramfs -u


    root@OMV:~# initramfs
    -bash: initramfs: Kommando nicht gefunden.


    Edit: Let the raid do the work, don't touch it ;-)
    I noticed it is the same disk again which failed, I hope the rebuild will fix it.
    When the raid is rebuild, but before you do a reboot, you have to run the command : update-initramfs -u

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!