SMART Failed Drive Replacement w/ MergerFS & SnapRAID

  • I'm looking to see if I can get some validation or confirmation of my plan relevant to the setup and situation I have below.


    Current Setup:

    • SnapRAID pool

      • 3x 4TB Data Disks
      • 1x 4TB Parity Disk
    • MergerFS

      • /storage with multiple sub-folders and content
    • SMART Test

      • Running every Wednesday night (corrected to Wed)
      • Recent report flagged 3 bad sectors, then within the last 3 days or so bumped up to 7 bad sectors
      • Already filed RMA with WD and new drive is here

    Plan:

    • Install new physical disk alongside others
    • Reboot into CloneDrive
    • Clone current failing disk to new disk
    • Shutdown, and remove failing drive
    • Startup and name disk same as failed disk
    • Add new (replacement) disk to SnapRAID/MergerFS pools as same name


    Does this logically make sense, and actually make sense to achieve what I'd like to? It seems cloning will have the quickest/easiest processing to get this done, and avoid stupid snafus by me on the command line with permissions and other potential problems. My only main concern really is ensuring SnapRAID doesn't have problems continuing on like nothing happened, MergerFS volume /storage isn't affected, and that the clone with bad sectors on the drive doesn't cause some other type of copied issue.


    If I lost the 7 bad sectors, I'd be ok with it honestly. I'd rather salvage the rest of the data, then be able to do a SnapRAID Fix operation.

  • So I've started down this path on the non-destructive processes. Specifically I've reboot into CloneZilla. Seems the bad sectors are being detected there as well. During an early portion of the clone process, I received an error from bad sectors. As suggested, I've restarted in expert mode and enabled the -rescue option.


    I would assume at this point then I may be missing data blocks, but the FS should be intact? If this is the case, then a Fix process from SnapRAID should fix my missing files. Assuming that I haven't overwritten or re-run a SnapRAID sync process that could no longer read the blocks in the bad sectors.

  • Well, I was figuring that using CloneZilla, and getting an actual replica of my drive would actually be faster and more efficient assuming the SMART detected errors are not fully evolved (Aka the just hit and I reacted right away) and I can get fully copies of the data. Even if not, I'd assume it should skip through those bad sectors pretty quick and leave me with almost a perfect replica. In my instance basically I haven't lost the disk, just got the SMART errors returned and proactively replacing the disks.


    If this seems foolish, perhaps I will go that route. At this point, the disk seems to be copying over at a slower and slower rate declining since the start of the Clone. Only made it to 8.8% of Data Block and 2.11% of Total Block processing in 1hr15min. Started at 9.25GB/min down to 886MB/min.

  • Nothing is going to go fast on drives of this size.


    The point of using snapRaid to recover an entire disk is that the disk you are replacing isn't involved in the recovery process at all, so any corruption on it isn't going to get carried into the new disk.


    As you have seen, the cloning isn't going very fast. Who knows how long it will take to complete? And then you still have to use snapRaid to "fix" it. That's going to be slow too if all the data has to be hashed and compared to the hashes previously calculated.


    At least you are in a position to try whatever you want as many times as you want.........if you have the time that is.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

  • Interestingly, you make me think of a different point in the restore speed. While the disk is a 4TB disk, it's only got about 800GB worth of content. Perhaps for that reason alone, a Repair operation using SnapRAID could produce a faster result as it only has to construct less than 25% of the data, whereas a clone will be attempting to read/write empty blocks (give or take, as I know advances reduce it from 100% copy of empty blocks) - but it should be a bit faster overall. And as you mentioned avoid the necessity to read the bad sector data.


    Perhaps since I haven't gotten very far, I'll attempt stopping it now and testing that method instead.

  • Well now it seems I can't manage to get the snapraid command to run locally on the system. This was my concern running it manually using SnapRAID, as when I recently attempted using SnapRAID to restore a few pictures that I deleted by accident, I ended up with wacky permissions. Likely because I was hit with this same permission denied when running the command. So I just used SUDO and had no issue. The end result though was mangled permissions that wouldn't let me access the content.



    Code
    >> snapraid -d disk3 -l fix.log fix
    Self test...
    Error creating the lock file '/media/97974d61-46e4-43fa-a535-54a31b4faec2/snapraid.content.lock'. Permission denied.

    The mentioned media mount, is not MergerFS, it is the local individual disk mount that is part of the pool.


    In doing a bit of reading, am I to possibly understand now that SnapRAID will unfortunately NOT restore the permissions for this content? If that's the case, I think I need to re-think my WHOLE strategy as I do have staggered permissions on content. Having to sort out all those permissions may be a dealbreaker for relying on SnapRAID.


    Since the disk isn't bad yet, is there any easy alternative to move the content onto the new disk? Perhaps just using RSYNC, then letting (assuming I keep using it) the SnapRAID index build again with the fresh content?

  • You must run snapraid as root. Permissions and ownerships are not restored. In my use case I do not have differing permissions and ownerships on the data so this is not a problem for me. So long as you copy the content with a tool that can maintain attributes you shouldn't have any problems.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

    Einmal editiert, zuletzt von gderf ()

  • Ya so it looks like I will need to search out a potentially different solution. I'm glad I discovered this now at least with a disk that has less stuff on it. And have a working copy to move from. In the process of a long RSYNC right now. Almost halfway through from about 3 hours ago so I'd say it's in good shape to finish pretty quickly overall, considering.


    Thanks for helping me discover this, glad I know now.

  • i read the manual and it assumes that command line is used

    if us noops where to do it in gui would it be as "simple" as unticking data and content on the failed drive and selecting them on the new drive?

    and then run these in sequence?


    snapraid touch; snapraid sync

    snapraid -p 100 -o 1 scrub

    snapraid -e fix

    btw i do like @1activegeek plan of cloning.... at least with DD you get all uuid's etc moved over and hence you just have to unplug the failed drive and leave the new one connected

    would that still be an option if time is no issue?

  • Not sure exactly what it is you are trying to do and you are making a new post in a thread that is more than five years old.


    If the disk you want to replace is still good, then cloning it with dd (or similar program) to a new disk that is as large or larger would be the easiest way. If the new disk is larger than the old disk, and assuming there is only one partition, then you will also have to expand the partition and then expand the filesystem on the new disk as dd will not do this for you but similar programs might. I suggest using dd, then parted to grow the partition, then resize2fs to expand the filesystem. See this link if not familiar with this process: https://saputra.org/threads/ho…-and-resize2fs-solved.94/

    If you can run Gparted Live on the machine, then it can do all of this for you instead of using three separate programs.


    I have been doing this lately because my server has only one available empty bay and when I get close to running out of disk space I replace an old 3TB drive with a new 14TB drive for a gain of 11TB. I grow my data by a bit more than 1TB/month and am about seven months away from needing more space.

    --
    Google is your friend and Bob's your uncle!


    OMV AMD64 7.x on headless Chenbro NR12000 1U 1x 8m Quad Core E3-1220 3.1GHz 32GB ECC RAM.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!