Help! Catastrophic failure

  • Thank you ryecoaaron! That certainly helped.


    Here's the output of those commands and then rerunning davidh2k's afterwards:


    https://db.tt/oMnMtrUt
    https://db.tt/iR9reZmp
    https://db.tt/tJ7Fbhc0
    https://db.tt/QX7zElCc


    7 of the drives were able to reassemble. One is still missing.


    What do you suggest from here?


    ----------------------------------------------


    On the other side of the equation, I've been thinking about the hardware on the NAS itself. I have four spare 1 terabyte drives doing nothing right now. They'd be good for testing -- since a single backplane is four drives. I was thinking of emptying the nas of the drives, and put the four spares into the first plane. Other than checking to seeing whether all four drives appear or not, any recommendations to test the backplanes?


    Thanks for the help. It's really appreciated.


    And you were right davidh2k. ryecoaaron is awesome! :)

    • Offizieller Beitrag

    Sounds like the missing drive is failing. I would do some smart tests on it and possibly replace it.


    To test the backplanes, I would create a raid 5 array and let it sync. That should tell you if the backplane is failing. If you don't want wait that long, then write 20 gigs of zeros to each drive. Either way I would do the following to each drive before the test (replace the X in sdX):


    mdadm --zero-superblock /dev/sdX
    dd if=/dev/zero of=/dev/sdX bs=512 count=10000

    omv 7.1.0-2 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.5 | scripts 7.0.7


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • I'm familiar with smartctl (used it a lot when I ran Unraid prior to switching to OMV). The systemrescuecd doesn't appear to have it.


    The Zotac machine I have the drives plugged into now is a spare (doesn't even have a hard drive in it). I am wondering unplugging the hard drive enclosures and installing OMV on a usb key would be a good idea. At least have more tools available. Does that sound like a good idea?


    It's definitely the planes, SAS expander, or LSI controller, since the system drive (a 100gig SSD -- overkill I know, but it was a spare) is plugged directly into the motherboard and works fine.


    Update... I put the four spare drives into each plane, and booted. Each plane saw all four drives both on the LSI screen during boot up (hauled out a TV to connect to the NAS) and in the OMV gui. Tested each plane to see if it saw all four drives several times each. Looks like they are all seen. After doing that I used your commands above on each drive and have started the first test RAID5 array. Waiting for it to sync now.

    • Offizieller Beitrag

    I'm familiar with smartctl (used it a lot when I ran Unraid prior to switching to OMV). The systemrescuecd doesn't appear to have it.


    Just booted 4.3.0 and it has smartctl. What version are you using?

    omv 7.1.0-2 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.5 | scripts 7.0.7


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • And if you type it with /dev/sdX?


    Please also try df -h and /bin/df


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • Dropbox both Error 404.


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • Oops. The USB cable for the CD drive wiggled loose, that was the cause of the issue. After reseating and rebooting, it works.


    smartctl -a -A /dev/sda gives:


    smartctl open device: /dev/sda [USB Jmicron] failed: no device connected


    hdparm -I /dev/sda gives me the model and serial number though, will try reseating it.


    And occurs to me that smartctl won't be helpful on this machine anyways. I'm pretty sure the Mediasonic enclosures don't pass through the smart data.


    It's a nettop so direct to the motherboard is not an option.


    I'll try the reseating first and see what happens.


    Reseating didn't help. Same drive was missing. Even tried swapping the enclosure with another of the same model. Same result, one drive marked as removed from the array.


    The Raid5 test array finished building on the NAS on the first "test" plane. Worked, and successfully formatted. Will now remove it and try the same on the next plane. Now that one plane appears to be okay -- perhaps I could put the problem drive from the RAID6 array that is currently on the nettop into a slot on the tested plane and run smartctl off that (since I have no way of doing that from the nettop?)

  • Looks like I created a new minor problem on the NAS while doing the RAID5 test. I formatted the new array (intending to create a test share and copy some stuff to it), and clicked to mount it, when I clicked apply to apply changes it gives an error saying it can't mount the missing arrays (the three original ones). Now the click to apply message won't go away.


    Did I need to modify config.xml in /etc/openmediavault to rectify this? (I won't be trying to mount any of the 5 remaining test arrays.)


    Thanks. Sorry for all the trouble.

  • The RAID5 test arrays worked on each backplane. Since I was unable to pull the SMART data from the mediasonic enclosures on the nettop, I decided to try on the NAS itself (since it would appear the backplanes, SAS expander and SAS controller are all fine). Put in the reassembled (but with one drive missing) eight 4TB drive array into the NAS. Booted fine and saw all eight drives. I restarted a few times just to be sure. Did a long SMART test on the drive that wouldn't go back into the array. Looked fine. No bad sectors, seek errors or UDMA errors. I wiped the drive and readded it to the array. 10 hours later, looks fine. So I have my backups back.


    Put the healthy of 5 2TB drives back. Everything fine.


    When I put the 10 disk RAID6 array of 3TB drives back in, it started going wonky again -- not recognizing drives, lots of errors during boot up, etc. When a lot of shutting down and powering on, taking out a different drive each time, I narrowed it down to a single 3TB which is now making the NAS go crazy if it's plugged in. The 10 drive array reassembled with the existing 9 drives and works pretty well. Just need to get a replacement drive and should be nearly back to normal.


    Thanks for all the help guys. Definitely making a donation when I find employment.


    Just one last problem now, that I can't seem to fix. I had the SMART settings turned on for all the drives. Now when I go into Smart > Devices it gives the error:


    Failed to get configuration (xpath=//services/smart/monitor/device[devicefile=''])


    I assume this is because of a drive missing with SMART settings on it. Like I have filesystem in the filesystem menu from my RAID5 tests. I've attached my config.xml. What should I edit to get rid of these two issues I've caused in trying to get back up and running.

    • Offizieller Beitrag

    What is the ID of the drive that failed? Or you can post the output of: ls /dev/disk/by-id/

    omv 7.1.0-2 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.5 | scripts 7.0.7


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • It's not in the machine right now, if I put it in errors abound, and other drives get dropped. Plugging it into another machine it's barely visible -- reads the model number, serial number not present, smart not readable, and cannot be formatted. Started a RMA on it already (they're shipping a replacement first). The Model and Serial number of the failed unit is:


    Model: ST3000DM001
    SN: Z1F2RT76


    Output:



    Here's the device ids in /media:


    Zitat

    0fdd510c-2858-47ad-8ffa-e684faeb85c0 b1b9f409-857d-47a6-992e-a6aa54dec710
    392ceca1-fe0c-438c-9f38-afc6a412fa0c cdrom
    7d310a5b-a999-4922-91ba-0e1ce7e133c6 cdrom0


    392ceca1-fe0c-438c-9f38-afc6a412fa0c is the non-existent RAID5 test array.


    Thank you so much for the help.

    • Offizieller Beitrag

    Should be able to :)


    From what I can tell, you need to remove anything with 0x5000c500504a9b40 in it. Backup your config.xml and remove the following from /etc/openmediavault/config.xml:


    Code
    <hdparm>
            <uuid>99e22b78-7ffb-4c4f-9f66-c8c3b711c6b0</uuid>
            <devicefile>/dev/disk/by-id/wwn-0x5000c500504a9b40</devicefile>
            <apm>254</apm>
            <aam>254</aam>
            <spindowntime>0</spindowntime>
            <writecache>0</writecache>
          </hdparm>


    Code
    <device>
              <uuid>e7632583-f943-4d91-b8d8-0098e466d9ea</uuid>
              <devicefile>/dev/disk/by-id/wwn-0x5000c500504a9b40</devicefile>
              <enable>1</enable>
              <type></type>
            </device>



    omv 7.1.0-2 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.2 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.5 | scripts 7.0.7


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • So is he going to be able to rebuild? What a frickin nightmare.


    Yeah, thankfully (thanks to the great people here), I'm mostly back already. The main array (the important one) is the one with the messed up drive (that caused all the other problems). It's back up online, albeit degraded, with 9 out of the 10 drives (RAID6). The 5 drive array of 2TB drives survived the whole thing unscatched. The 8 drive array of 4TB drives successfully reassembled, and resilvered for the one drive that didn't reassemble. That last array is a straight up rsync backup of the first array.


    An RMA of the dead drive is on its way (I paid $10 to have it rushed), and should be here tomorrow, so I can resilver. Given how big of a failure this was, I came out very very lucky. Again, thanks to the great people here.


    I'll make the suggested changes to config.xml now, and edit this post later with the results. Thanks ryecoaaron.


    ----


    Update: Did the changes, (made a backup of config.xml), and rebooted. Still get the same error on the devices page of the smart section:


    Failed to get configuration (xpath=//services/smart/monitor/device[devicefile=''])


    Found the entry for mounting the Test RAID5 filesystem and manually took that out also. That worked at least, so just devices page error left.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!