Unresponsive drive/likely failure, very anxious as to how to resolve

  • Hi All,
    My current setup is an R510 with 6x 4TB SAS/SATA HDDs and 2x2TB SAS/SATA HDD.OMV 4 is running from a 128gb SSD.
    2x4TB HDD are devoted to SnapRAID Parity, with 2x2TB and 4x4TB as Data drives.Total available space in the snapraid datastore is combined with UnionFS into a single datastore.4 x SMB shares live on the Datastore.
    Tonight i noticed an odd sound, visually checked the server and noticed a single HDD light was flashing repeatedly when there should have been little activtiy. I went to the GUI and it would not enter the Disks, Smart, or filesystem tabs, attempting this caused a communication failure.


    Syslog shows this give or take the first line, repeated over and over:



    As the system was begining to get unresponsive i opted to try and free the GUI from what i read online may be a communication failure as a command had been sent prior to the drive possibly failing, and OMV timiing out waiting for the response?... I rebooted from the GUI.
    After restart and minor coronry! the GUI was working again but i'm still unable to load SMART or disks, but i was able to load snapraid enough to check/edit the name of the disks in the each of the data and parity sections, and it appears that its one of the two 4TB drives used for parity thats died/dying.
    As such the USB backup that runs each time the disk is connected (so started on reboot) doesn't worry me as much as i initially thought (was super worried it would sync lots of corrupted changes)
    I've been stupid and added files over the last week and not synced in 8 days (that i know of, i don't know how to check if something is scheduled as things seem to happen on the server that i can't find GUI schedules for!!)
    My Questions are, how to i confirm whats wrong with the unresponsive disk (its a Dell Constellation 4TB SAS) as i can't run a smart check, i assume its completely toast?
    If above is correct should i shutdown ASAP and remove the drive so the system doesn't thrash?
    My tiny brain is looking at http://www.snapraid.it/manual and isn't sure how to proceed replacing the drive from the OMV GUI, is that possible?


    Thank you for any assistance in advance, sorry for the long post.


    ** edit - can't access Disk or SMART tabs - Error:


  • You could take different approaches, but probably the best one is to swap the drive with a new one a rebuild the data using the procedures outlined in the snapraid manual.
    You could also turn the system down, extract the drive and see if behaves the same in another system, where you could smart scan it.


    Additionally, your SnapRAID script doesn't perform a SMART check of the drives and sends you the output?

    OMV BUILD - MY NAS KILLER - OMV 6.x + omvextrasorg (updated automatically every week)

    NAS Specs: Core i3-8300 - ASRock H370M-ITX/ac - 16GB RAM - Sandisk Ultra Flair 32GB (OMV), 256GB NVME SSD (Docker Apps), 2x16TB HDDs w/ SnapRAID - Fractal Design Node 304 - Be quiet! Pure Power 11 350W


    My all-in-one SnapRAID script!

  • Hi,


    I have shut the system down and brought it back online, the drive remains unresponsive, I haven't edited any scripts really when setting up this OMV server, only really used the GUI and a few plugins.
    This is my first foray into this type of system and i have no real Linux experience other than mint and Ubuntu so haven't had to do much of anything i couldn't copy and paste from a forum.


    My SnapRaid setup was done entirely in the GUI and I've simply run a sync every week or so as i found that there was conflicting info as to what is the best schedule so left it manual until i started to understand more.
    Sadly this drive has failed very quickly before i was prepared for it, thankfully it seems to be a parity disk. And my USBbackup ran when i rebooted the server so i should have a clean backup of the shares as well.


    OMV was fine for an amateur like me to setup from a few forums and youtube videos, but i'm a bit anxious about pulling the drive and mangling the whole datastore with shell commands i don't fully understand.
    I'd also like to make adjustments to the config so i have warning next time, and a better SnapRaid and USB backup schedule etc.


    Hoping someone doesn't mind helping first timer out :) I'm not sure how much of the SnapRAID guide relates to how you I would actually perform the tasks or set things up in OMV itself??

  • I have a spare 4tb drive and a new one on the way and want to first replace the dead drive in my OMV server.
    But as the snapraid docs do not mention OMV and I only know how I initially set things up there, I am unaware what is the correct way to proceed.
    As the drive is parity, if it has no content file(I'll check when I get home) can I just swap out the drive?
    I expect not, I expect that I need to tell snapraid that the drive itself is gone before adding a new one and reducing, but am unsure.


    Surely someone has swapped a drive out while using snapraid in OMV?

  • All the directions you need are in the SnapRAID manual available at


    Code
    https://www.snapraid.it/manual
    
    
    See section 4.4 Recovering

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • So i guess me puttinging in my original post that the manual didn't make sense to me as it was for snapraid with no mention of how this is handled in OMV, didn't get noticed.


    I have read that manual and that part, but sadly i have little experience managing things that way.


    So far i have done the following just as it makes sense, seems logical:


    - swapped the dead disk with a temp replacement.
    - booted back into OMV, and run WIPE from the disk tab
    - run CREATE from the Filesystem tab, then mounted the new disk
    - removed Parity_2 from the snapraid tab as this is the entry that pointed to the dead drive.
    - added the replacement disk/filesystem as a new Parity_2 entry in snapraid.


    What i am uncertain of is should my next step be sync or fix.
    Based on the manual i expect running a fix is the next step as thats whats next in the manual, but as the OMV GUI has to option to add logging to the Fix option etc, i wanted to see if anyone has done it like that before?
    I also wanted to leave a more detailed log of what has been done to hopefully successfully get through this so that the next person thats a newbie that searches for a fix finds some instructions without having to ask etc.


    Thanks


    Edit; ooof, if this works it's going to take 23 hours to complete... I expect that may be right, it's like doing an initial sync on a 15Tb datastore I guess....
    My worry now is this... as I hadn't synced in over a week and added a load of files during that time if a content drive failed I would have lost data for sure, BUT in this case with the drive failing being a parity drive, surely it's just reconstructing the parity data for the dead drive from the 6 data drives right? Like a fresh sync? Not rolling back all 6 data drives to the state recorded in the 1 remaining parity drive? Because if it's doing that I may as well wipe the array and start from scratch and copy from the usb backup. Anyone know?

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!