ZFS Degraded Pool

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • ZFS Degraded Pool

      Hi guys!
      I Went to the lengthy process of backing up all my customer files and then buying 6 new 3tb drives that I bought at different times...I created a ZFS pool with three mirrored pairs which is an expensive way of doing things but been nagged about security ....so I listened!

      I add the disks by disk-id and checked that every thing was OK and then went about adding my customer files.....I don't leave my computers on 24/7 just find it a waste of electricity and don't need access 24/7...

      I have had a couple of hickups regarding a single disk been missing...and then coming up errors etc...

      Thankfully disk by-id lets me see which culprit is causing the problem....

      The system is in a "scrub" state to repair the system....I used up the last of my 3tb drives last week (spare) and have no more but my supplier is sending a new one that will be here tomorrow....I have spent a lot of money with this company and they don't question anything...we will send a new one to replace the damaged one under guarantee... :)


      This is the state of play at the moment so that you can see the scrub in progress...

      Not asking for help because that isn't necessary just little info for those getting into ZFS :)

      Should of said...my supplier sells the harddrives by serial number...that way they keep track of guarantees on Products...not all companies do that and lose Money....

    • Thanks for posting and please post the final status. This is a real world example of protection from data corruption, when a drive goes south, with a few data points on what happened during the repair process.

      If you would, before you pull it, look as the SMART counts on the bad drive for the following:

      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.

      SMART 199 - UltraDMA CRC error (This one is usually hardware/cable related)

      Thanks!
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      Backup Server:
      OMV 4.1.8.2-1, Acer RC-111, 4GB, 32GB USB boot, 3TB+3TB+4TB Rsync'ed disks+SNAPRAID
      2nd Data Backup
      R-PI 2B, 16GB boot, 4TB WD MyPassport
    • Hi flmaxey!
      I collected the new drive yesterday and have attached it but not replaced it via terminal yet....

      Does anyone know if it is OK to remove the drive that is faulty because as soon as I ran zpool status I had the following:


      I would appreciate an answer to this one...don't want to run the zpool command to remove the problem drive before being sure....;)

      Here is the smart info for that drive:




      bookie56

      The post was edited 2 times, last by bookie56 ().

    • ZFS Degraded Pool

      Did you do a „zpool clear“ or why are all the values resetted to 0?

      Didn’t know that it’s possible for the pool to resilver on it’s own, without command line input by an admin.

      I read that it’s possible to reboot while resilvering is in process. The process starts where it stopped after the reboot. But what happens if you change the disk while resilvering, I don’t know.

      I would wait until resilvering is done and after that I wouldn’t remove the defective disk. I would do the following:

      1. shutdown the server
      2. place the new disk on a free sata/sas port and don’t remove the old disk, if possible
      3. restart the server
      4. start resilvering with full parity
      5. when resilvering is done, shutdown the server
      6. remove the defective disk

      What dobyou and the other guys think about this procedure in this situation?

      Greetings Hoppel
      ---------------------------------------------------------------------------------------------------------------
      frontend software - android tv | libreelec | win10 | kodi krypton
      frontend hardware - nvidia shield tv | odroid c2 | yamaha rx-a1020 | quadral chromium style 5.1 | samsung le40-a789r2 | harmony smart control
      -------------------------------------------
      backend software - debian | kernel 4.4 lts | proxmox | openmediavault | zfs raid-z2 | docker | emby | vdr | vnsi | fhem
      backend hardware - supermicro x11ssh-ctf | xeon E3-1240L-v5 | 64gb ecc | 8x4tb wd red | digital devices max s8
      ---------------------------------------------------------------------------------------------------------------------------------------

      The post was edited 2 times, last by hoppel118 ().

    • ZFS Degraded Pool

      I am not a zfs expert. I use it for my home server only. But in my opinion it doesn’t look as it should.

      Maybe anybody else can tell us... :)

      Greetings Hoppel
      ---------------------------------------------------------------------------------------------------------------
      frontend software - android tv | libreelec | win10 | kodi krypton
      frontend hardware - nvidia shield tv | odroid c2 | yamaha rx-a1020 | quadral chromium style 5.1 | samsung le40-a789r2 | harmony smart control
      -------------------------------------------
      backend software - debian | kernel 4.4 lts | proxmox | openmediavault | zfs raid-z2 | docker | emby | vdr | vnsi | fhem
      backend hardware - supermicro x11ssh-ctf | xeon E3-1240L-v5 | 64gb ecc | 8x4tb wd red | digital devices max s8
      ---------------------------------------------------------------------------------------------------------------------------------------
    • ZFS Degraded Pool

      Ok, thank you! ;)

      Bye
      ---------------------------------------------------------------------------------------------------------------
      frontend software - android tv | libreelec | win10 | kodi krypton
      frontend hardware - nvidia shield tv | odroid c2 | yamaha rx-a1020 | quadral chromium style 5.1 | samsung le40-a789r2 | harmony smart control
      -------------------------------------------
      backend software - debian | kernel 4.4 lts | proxmox | openmediavault | zfs raid-z2 | docker | emby | vdr | vnsi | fhem
      backend hardware - supermicro x11ssh-ctf | xeon E3-1240L-v5 | 64gb ecc | 8x4tb wd red | digital devices max s8
      ---------------------------------------------------------------------------------------------------------------------------------------
    • I kind of understand where @hoppel118 might be puzzled by the "auto-replace" behavior. (Similarly, I'm not a ZFS expert either. I'm using it on my home server.)

      On the other hand:
      In most cases, default behavior would be to wait for admin input before doing anything. In looking for the reason(s) why something like this might occur, I looked at my pools properties and found the following:


      As noted above, the default setting for autoreplace (and autoexpand) is off.
      ________________________________________________________________

      While this is speculation (if you didn't use the OMV GUI to setup your pool):
      The tutorial links you provided didn't mention "autoreplace" in text so, I imagine, the videos must have included the autoreplace feature at some point.

      And while I could have set this property in the GUI:
      I tested this command, on the CLI, in a VM: zpool set autoreplace=on ZFS1
      That changed the property to on.


      I'll look at this a bit more, in a VM.
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      Backup Server:
      OMV 4.1.8.2-1, Acer RC-111, 4GB, 32GB USB boot, 3TB+3TB+4TB Rsync'ed disks+SNAPRAID
      2nd Data Backup
      R-PI 2B, 16GB boot, 4TB WD MyPassport
    • ((A quick note - I was wrong in the above when I said autoreplace can be set in the GUI. As it seems, it must be done on the CLI.))

      To test your array for autoreplace, run the following command line:
      zpool get autoreplace Rocky
      If it returns a value of "on", at some point (setup?) it was turned on.

      ________________________________________________________________

      On the drive attributes you posted above - I didn't notice anything that would indicate that the drive is going to fail. If you're wondering why ZFS kicked the drive out of the pool, consider doing the following on the command line:

      apt-get install curl

      then

      Source Code

      1. for disk in /dev/sdX ; do smartctl -x $disk ; done | curl -F 'sprunge=<-' http://sprunge.us
      (Replace the "X" above, with the appropriate drive letter. The output of the above will be a URL. Please post the URL.)
      ___________________________________________________________________

      While it's easy to simulate a missing drive in a VM, to test autoreplace, I needed to create a serious fault. So I used Gparted and formatted 1 drive of the 2 in the mirror as EXT4. (That did the trick. :) )




      I didn't catch it resilvering (only 257MB - fast) but the drive was formatted and resilvered as if it was new, as expected with "autoreplace".

      In looking through Oracle Doc's for pools, the following is what I found on autoreplace:

      Controls automatic device replacement. If set to off,device replacement must be initiated by using the zpool replace command. If set to on, any new device found in the same physical location as a device that previously belonged to the pool is automatically formatted and replaced. The property abbreviation is replace.


      The above suggests that the old drive would need to be unplugged and a new drive plugged into it's place, as into the same sata port. From your posts, I got the impression that you simply plugged a drive (into a 7th sata slot?) and the replacement kicked off. Now, you're waiting before disconnecting the old drive. Is that correct?
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      Backup Server:
      OMV 4.1.8.2-1, Acer RC-111, 4GB, 32GB USB boot, 3TB+3TB+4TB Rsync'ed disks+SNAPRAID
      2nd Data Backup
      R-PI 2B, 16GB boot, 4TB WD MyPassport
    • Hi
      I didn't see any problems with the drive readings either...
      After restart of the computer I had the same error for the replacement drive and degraded state again...sorry didn't take any pics...

      I turned of the computer and replaced the sata cable....

      When I restarted I had a checksum error of 5 so I ran:


      Source Code

      1. # zpool clear Rocky /dev/disk/by-id/ata-WDC_WD30EFRX-68N32N0_WD-WCC7K2TE38XH
      And then I had the following:



      I am going to scrub the system to see if anything else comes up....


      bookie56
    • ZFS Degraded Pool

      I don’t think that this has something to do with zfs. Never seen such a behavior before. But it’s possible. Seemingly you had bad luck with your new disk.

      I also replaced a faulty disk (wd red 4tb) of my raidz2 last week. Resilvering worked as expected for me.

      Greetings Hoppel
      ---------------------------------------------------------------------------------------------------------------
      frontend software - android tv | libreelec | win10 | kodi krypton
      frontend hardware - nvidia shield tv | odroid c2 | yamaha rx-a1020 | quadral chromium style 5.1 | samsung le40-a789r2 | harmony smart control
      -------------------------------------------
      backend software - debian | kernel 4.4 lts | proxmox | openmediavault | zfs raid-z2 | docker | emby | vdr | vnsi | fhem
      backend hardware - supermicro x11ssh-ctf | xeon E3-1240L-v5 | 64gb ecc | 8x4tb wd red | digital devices max s8
      ---------------------------------------------------------------------------------------------------------------------------------------
    • ZFS Degraded Pool

      Ok, that’s weird... Sorry, can’t help you with this. No idea at the moment.
      ---------------------------------------------------------------------------------------------------------------
      frontend software - android tv | libreelec | win10 | kodi krypton
      frontend hardware - nvidia shield tv | odroid c2 | yamaha rx-a1020 | quadral chromium style 5.1 | samsung le40-a789r2 | harmony smart control
      -------------------------------------------
      backend software - debian | kernel 4.4 lts | proxmox | openmediavault | zfs raid-z2 | docker | emby | vdr | vnsi | fhem
      backend hardware - supermicro x11ssh-ctf | xeon E3-1240L-v5 | 64gb ecc | 8x4tb wd red | digital devices max s8
      ---------------------------------------------------------------------------------------------------------------------------------------
    • Well, every time I start the computer it goes into repair mode...

      zpool clear (pool) doens't help it just shows the drive as faulted....

      Just about had enough of this so called system that improves data security....

      I have a backup of the files and will erase everything and start over...

      If this continues - then ZFS is just a waste of time...

      bookie56
    • flmaxey wrote:

      ((A quick note - I was wrong in the above when I said autoreplace can be set in the GUI. As it seems, it must be done on the CLI.))

      To test your array for autoreplace, run the following command line:
      zpool get autoreplace Rocky
      If it returns a value of "on", at some point (setup?) it was turned on.

      ________________________________________________________________

      On the drive attributes you posted above - I didn't notice anything that would indicate that the drive is going to fail. If you're wondering why ZFS kicked the drive out of the pool, consider doing the following on the command line:

      apt-get install curl

      then

      Source Code

      1. for disk in /dev/sdX ; do smartctl -x $disk ; done | curl -F 'sprunge=<-' http://sprunge.us

      (Replace the "X" above, with the appropriate drive letter. The output of the above will be a URL. Please post the URL.)
      ___________________________________________________________________

      HTML Source Code

      1. root@rocky:~# for disk in /dev/sdg ; do smartctl -x $disk ; done | curl -F 'sprunge=<-' http://sprunge.us
      2. <html>
      3. <head>
      4. <title>500 Internal Server Error</title>
      5. </head>
      6. <body>
      7. <h1>500 Internal Server Error</h1>
      8. The server has either erred or is incapable of performing the requested operation.<br /><br />
      Comes up as above?

      The post was edited 1 time, last by bookie56 ().

    • Users Online 1

      1 Guest