Disk in prefail in raid 5

  • Hello guys
    I have a raid 5 with three 1-TB disks. Now the smart service tells me that one of the disks is in prefail. When will it die, will it be enough to replace it and the raid will be reconstructed on its own or do I have to rebuild it? I hope not... ;(


    In the end what is the correct way to proceed when the disk will die?


    thanks for attention

  • Hi, I have run a number of tests in a virtual environment for this exact issue as I couldn't find any definitive information either. I fixed the problem by doing some RAID maintenance from the console. In the five tests I did not one failed so I am fairly certain it will work for you as well.

    • Offizieller Beitrag

    Now the smart service tells me that one of the disks is in prefail.

    In themselves, the flags "pre-fail" or "old-age" doesn't mean that drive failure is imminent.
    (I have two older laptops that indicate "pre-fail" and "old-age" but they keep on, keeping on.)


    The most important stat's to look at, in attempting to predict hard drive failure, are:
    SMART 5 – Reallocated_Sector_Count.
    SMART 187 – Reported_Uncorrectable_Errors.
    SMART 188 – Command_Timeout.
    SMART 197 – Current_Pending_Sector_Count.
    SMART 198 – Offline_Uncorrectable.


    If you have zero counts in the above, there's nothing to worry about.


    This information comes from www.backblaze.com. They're compiling the most exhaustive statistics available, with regard to SMART indicators versus actual drive failures, from ten of thousands of hard drives of different brands, sizes, etc.


    When will it die, will it be enough to replace it

    Even with the above mentioned SMART stat's beginning to increment, there's no way to know when a drive will fail, for sure. It could be tomorrow, on the next reboot, or next year.
    However, you should be hearing a klaxon blaring right now that says; "BACKUP"! If that means going out and buying a 4TB external USB drive, to copy your data, sobeit.


    and the raid will be reconstructed on its own or do I have to rebuild it? I hope not...


    In the end what is the correct way to proceed when the disk will die?


    thanks for attention

    Your best bet would be to add another 1TB drive (it must be the same size or larger) to the array, as a hot spare. Noting the SMART stat's to look for (above), if you're going to wait for a failure, it's far safer to have a hot spare already installed.
    _______________________________________


    You'd add a spare by physically adding a drive, booting up and under Physical Disks, wipe the new drive. In RAID management, click on the "Grow" button and select the new drive. After it's added, under Detail, you'll see it listed as "spare". Done.


    If / when the drive in the array fails you'll see a State of clean, degraded. At this point, the hot spare will automatically rebuild the array. To see this, you'd click on Detail and scroll to the bottom. (Shown below.)



    Be patient, this may take a l-o-n-g time. After the rebuild is finished, the state will change to clean. After recovery, in RAID Management, Remove the failed drive from the array if it's still in the Detail view. Then, shutdown and physically remove the failed drive. (Make sure you pull the right drive.)
    ________________________________________________________


    **Due to making the possibility of mistakes, there's risk in the following.**


    If you don't want to wait for an actual failure and / or if you don't have enough room to house a 4th array drive in your box:
    You'd need to click on RAID Management, "Remove" and select the drive that's failed or showing bad SMART stat's. (It MUST be the correct drive, or you're risking the loss of entire the array.) The array status will go to clean, degraded.


    Shutdown, remove the correct (bad stat's) drive. Add another 1TB+ drive and boot up. In Physical Disks, wipe the drive (again, make sure you're wiping the correct drive - the new one). Go to RAID Management, and click on Recover and add the new drive to the array.


    **Again, recovery may take a long time.** Be patient and, if possible, try not to multitask the server until recovery is complete.
    (In any case, unless you have a hot CPU, you may find that your server is somewhat sluggish if you use it.)
    In RAID Management, you'll be looking for a state of "clean".
    __________________________________________________________________


    I'm pretty sure the processes, above, are what you need. If at all possible, it would be best to get a hot spare on line as soon as possible, even if that means the side of the box is open and a loose drive is sitting in the bottom of the case, attached by sata and power cables. (Use common sense.) Having a hot spare on-line, before any event, is good insurance against potential problems.


    I can't stress enough how important it is to do a full backup while you can, preferably with a hot spare already in place. Your recovery may go well. On the other hand, it may not. It's your call.


    Good Luck.

  • Thanks very much for those informations! Very very clear and usefull!!!


    J do backup by usb backup every time adding something in my nas. So for my data all is ok!


    The disk in prefail has the Reallocated sector count value 9. Unfortunately j've not space for another disk, so j think j'll wait the disk failure and j'll substitute it like you say in the second process.

    • Offizieller Beitrag

    Thanks very much for those informations! Very very clear and usefull!!!


    J do backup by usb backup every time adding something in my nas. So for my data all is ok!


    The disk in prefail has the Reallocated sector count value 9. Unfortunately j've not space for another disk, so j think j'll wait the disk failure and j'll substitute it like you say in the second process.

    The fact that you have your array backed up puts you way ahead of the game. That gives you a few options and some breathing room. I'm continually amazed by those who believe that RAID itself is backup. Unfortunately, some of them lose everything.
    ________________________


    If SMART 9 has a value (other than 0), there's a high probability that the drive is headed toward failure. It's just a question of when. Watch to see if it increments upward. If it does.....
    ________________________


    One last item, that you shouldn't leave to chance, is positively identifying the dying drive.
    (If you pull a good drive, instead of the drive that has/is failing, the RAID hole gets real deep.)
    The following is courtesy of tkaiser:
    On the command line, copy the following in:


    apt-get install curl


    for disk in /dev/sd? ; do smartctl -x $disk ; done | curl -F 'sprunge=<-' http://sprunge.us


    In the above, substitute the letter of the failing drive in for the ?
    The output of the above will be a URL. Copy the URL into a web browser.


    You'll get a complete profile of the disk, SMART stat's, serial #, model #, etc. With that, you can verify the drive against its' label info.


    As an example, here's a profile of one of my backup drives. http://sprunge.us/SiWO
    (BTW, I don't know the time interval until this info is purged, so copy and save your info. )

  • Yes ... unfortunately the value of Reallocated Sector Count is 9..mightly does not seem to increase...Anyway I can easily identify the disk in prefail, because it is a Western digital, and therefore has the WD sign unlike the other two that are seagate and have the ST sign. So it's easy for me to locate the disk ...PS: For back up I also try to handle geographic risk: every two months I make a second copy of back up and leave it in a hidden place at my parents' home...So in case of fire, theft or accidental damage I always have a second copy of back up ...it's exaggerated??? :huh:

    • Offizieller Beitrag

    So in case of fire, theft or accidental damage I always have a second copy of back up ...it's exaggerated??? :huh:


    Heck no! Any admin that is serious about backup always maintains an off site copy, for the reasons you mentioned and others. Water damage (leaking roof), lightning strikes, etc., etc.


    That's my only current weakness. After I move, I'm planning to set up one backup in a shed that's a bit over 100ft away from the house. (The shed has power and the wireless range is fine.)

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!