8Tb RAID 5 array is Clean, FAILED. Pls help before I do something to make things worse.

  • Hi everyone,
    A few days ago, the RAID in my OMV box disappeared. The details are in this post. RAID 5 array has vaporized! Advice needed please
    Last night the RAID was rebuilding, it was accessible and when I went to bed, the rebuild was up to 80% complete. All looked good in my world.


    Then, I woke up this morning... and found that it was reporting the RAID as clean, FAILED. Then I checked the details and found this:


    BLKID gave this:

    Code
    blkid
    /dev/sda1: UUID="d81eacfe-439c-4e12-bbb2-a933e69d4dfa" TYPE="ext4"
    /dev/sda5: UUID="6e724718-95ae-4e0c-9e17-a469c4a7627e" TYPE="swap"
    /dev/sdc: UUID="3e952187-f4e8-e08a-19b7-63a4cdc912c7" LABEL="OMV2:OMV" TYPE="linux_raid_member"
    /dev/sdd: UUID="3e952187-f4e8-e08a-19b7-63a4cdc912c7" LABEL="OMV2:OMV" TYPE="linux_raid_member"
    /dev/sde: UUID="3e952187-f4e8-e08a-19b7-63a4cdc912c7" LABEL="OMV2:OMV" TYPE="linux_raid_member"
    /dev/sdf: UUID="3e952187-f4e8-e08a-19b7-63a4cdc912c7" LABEL="OMV2:OMV" TYPE="linux_raid_member"
    /dev/md127: LABEL="OMV" UUID="13a7164c-7be5-49e9-ab63-d704f96f890e" TYPE="ext4"
    /dev/sdg: UUID="3e952187-f4e8-e08a-19b7-63a4cdc912c7" LABEL="OMV2:OMV" TYPE="linux_raid_member"
    /dev/sdb: UUID="3e952187-f4e8-e08a-19b7-63a4cdc912c7" LABEL="OMV2:OMV" TYPE="linux_raid_member"


    cat /proc/mdstat gave this:

    Code
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : active raid5 sdb[7](S) sde[5] sdd[3](F) sdc[2] sdf[6]
          7813523456 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/3] [UUU__]
    
    
    unused devices: <none>


    Looking at each drive with mdadm --examine, I found this:


    So, I figured that if I add another drive in as a spare, that the RAID would start to rebuild, but it didn't.


    (Continued)

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

  • (Continued)
    Looking at the events for each drive, I find this:


    From my noob limited knowledge, it looks like there is a good chance that the RAID can be rebuilt with sdb, sdc, sde, and sdf. I am thinking that if I run mdadm --assemble --run --force /dev/md127 /dev/sd[b-c,f], will it work to rebuild the array? Or, am I going to loose everything?


    I am on the verge of panicking, as I have over 1200 movies and over 150 TV series that we have collected over the past 10 years, I am hoping that that I'm not totally..... I don't want to even think about it.
    If there is anything that I can do to save this, I would be grateful for any help that anyone can provide.
    Thank you!

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

    • Official Post

    Raid 5 can only handle one drive failing. Anymore than that and all data is gone :( Raid isn't backup...


    You have such a mess. How many drives are in the original array? Which drives are they?

    omv 7.4.8-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.14 | compose 7.2.5 | k8s 7.3.1-1 | cputemp 7.0.2 | mergerfs 7.0.5 | scripts 7.0.9


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Five drives in the original. They were: sdc, sdd, sde, sdf and sdg.


    It appears that sdg was the original one that dropped out of the array.


    sdb was added to replace sdg when the array tried to rebuild.

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

  • Looking through the syslog, I've come across this from when it was rebuilding:


    What that tells me, is that sdd put it's fingers in it's ears and stopped talking, possibly due to a bad sector. I know that when I went to bed at midnight, the array was at 80% rebuilt and this happened at 3:20, so it was pretty close to finishing the rebuild. If I run Spinrite on sdd and am able to get the drive reading again in that area, would I be able to try and rebuild again?


    I realize that RAID isn't a form of backup, which is why I have been eyeing SnapRaid. The problem is that this collection has been growing slowly over the years and has sort of grown out of control size wise. If I could have found someone that was willing to loan me about 8TB of drives to move this stuff to while I switched to SnapRaid, that would have been great, but it never happened.... :(

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

    • Official Post

    If spinrite fixes the drive, I would try rebuilding. That should be just to get the content off the raid array though.


    If you were close, I might be able to help :)

    omv 7.4.8-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.14 | compose 7.2.5 | k8s 7.3.1-1 | cputemp 7.0.2 | mergerfs 7.0.5 | scripts 7.0.9


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Thanks Rye. I'm going to run SpinRite. Unfortunately, it's not a quick process. (There is so much promise with the speed of scanning drives with the new version, when it comes out...)


    Thanks for the offer of help.... Let's see, drive time from Toronto to Wisconsin is about 12 hours.......... ;) Just kidding, I have driving duty, getting my wife back and forth to dialysis.

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

  • Well, I decided to run Spinrite on all the drives and sure enough, sdb and sdd had read issues. SpinRite seems to have corrected the issues on sdd, but sdb seems to be a little more of a challenge. Seeing that the array was originally sdc, sdd, sde, sdf and sdg, then sdd dropped out and the whole thing crapped out when it was trying to rebuild on to sdb to replace sdd, can I try to force assemble it with sdc-sdg? There was no changes to the content originally while this was going on with the array, so I hope that helps with the odds. Or would it be better to wait to see if SpinRite can bring sdb back from the dead?

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

    • Official Post

    I would try to use the original drives if possible.

    omv 7.4.8-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.14 | compose 7.2.5 | k8s 7.3.1-1 | cputemp 7.0.2 | mergerfs 7.0.5 | scripts 7.0.9


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Ok, what a wild ride so far.... I ran SpinRite on all the drives and sdb is toast. So, I focused on the five original disks. I did a

    Code
    mdadm --assemble --run --force /dev/md127 /dev/sd[b-f]

    which seemed to work. (Being that I had booted up without the old sdb, all the drive letters got reassigned.) I tried mdadm --detail /dev/md127 and got this:


    I shut the box down to add another drive in to replace the toasted sdb and when it booted, it hung for a while "Checking Quotas". When it finally finished that and booted up, I got this:


    So far, so good. Through the webGUI, I chose the Recover option and added the new sdb to the array. The array started to do the rebuild and I went to bed.


    This morning, I woke to find that the array was listed as clean, FAILED, and I could not access any files on it. It looks like it ran up against more issues with drive sdd. Here is the mdadm --detail /dev/md127:


    I then added another drive to the array, thinking that it would rebuild on to that, but no dice. At that point, I realized that I was back at square one, so I rebooted the box, and issued this:

    Code
    mdadm --assemble --run --force /dev/md127 /dev/sd[b-f]
    mdadm: forcing event count in /dev/sdd(3) from 1902723 upto 1903306
    mdadm: clearing FAULTY flag for device 2 in /dev/md127 for /dev/sdd
    mdadm: /dev/md127 has been started with 4 drives (out of 5) and 1 spare.


    I was able to mount the array and access the files. Obviously, there is something with the sdd drive, as it seems to crap out during the rebuilding process. I am looking for the best option to go with from here. I realize that I am on the edge of the cliff with my toes danging over. If one more drive fails, I'm pooched. I am sitting here with an array that is listed as clean, degraded, recovering, but craps out during the rebuild due to a drive that if cranky. I realize that there could be a very small area on sdd that is causing problems. Here is what I've come up with for ideas:

    • Run SpinRite again on the sdd drive to see it can access the trouble area. If I do this on a Level 3 or 4, it was saying that it would take about 2 weeks to complete, is it worked. Then add the sdd drive back to the array and rebuild to a spare.
    • Take the sdd drive and use Clonezilla to copy it to a spare 2Tb drive that I have and use the -rescue switch. Then replace the old sdd drive with the cloned one and rebuild with a second spare drive. What I don't know, is if the rebuild will handle the missing sector differently that it did when it was getting a I/O error back from the old sdd drive during the prior rebuild attempts.
    • Say screw it, and run the array on the four out of five drives, so the array is clean and degraded and copy everything off the array and go from there, with either Greyhole or SnapRAID.

    Opinions?

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

  • BTW, if the best option is the last, then I would like to copy the files off by connecting the drive to the OMV box and moving them. I could do it the painful way, through the CLI, but is there a better way? I know, that moving 8Tb off through the NIC, to hard drives in another box, will take forever and a day.
    Thanks!

    OMV 4.1.0-1 Arrakis running on:
    IBM System x3400 server
    Dual Xeon 5110 1.6Ghz CPUs
    4Gb RAM
    40Gb IDE System drive
    8-2Tb Data HDDs

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!