ZFS: zpool offline fails to change state of faulty drive

davidknudsen · 30. Oktober 2019

Shortly after setting up my zpool, one Seagate drive started showing bad sectors ... so it is being returned for a replacement.

The procedure for replacing the drive, seems to be:

zpool offline <pool> <bad drive>
zpool replace <pool> <bad drive> <new drive>
wait for resilver

Unfortunately, zpool offline does not have the expected result of changing the state to 'OFFLINE' -- the drive still shows up as 'FAULTED'.

Code

root@omv-nas:~# zpool status
  pool: data
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: none requested
config:


        NAME                                   STATE     READ WRITE CKSUM
        data                                   DEGRADED     0     0     0
          mirror-0                             ONLINE       0     0     0
            ata-ST12000NM0007-2A1101_ZJV310R4  ONLINE       0     0     0
            ata-ST12000NM0007-2A1101_ZJV3B4T7  ONLINE       0     0     0
          mirror-1                             DEGRADED     0     0     0
            ata-ST12000NM0007-2A1101_ZJV24C09  ONLINE       0     0     0
            ata-ST12000NM0007-2A1101_ZJV2LDGN  FAULTED      0     0     0  external device fault
          mirror-2                             ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VK1B10SY  ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VK1E4SPY  ONLINE       0     0     0
          mirror-3                             ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VK1E696Y  ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VLH2U7GY  ONLINE       0     0     0


errors: No known data errors

Alles anzeigen

I have tried all the variations of 'zpool offline' I could think of:

zpool offline data ata-ST12000NM0007-2A1101_ZJV2LDGN
zpool offline data /dev/disk/by-id/ata-ST12000NM0007-2A1101_ZJV2LDGN
zpool offline data 3630560290011746901 (GUID)
zpool offline -f data 3630560290011746901 (GUID)

There are no errors reported from the 'zpool offline' commands, but also no change in the state of the drive.

Will this cause me trouble when the time comes to 'zpool replace' this drive with a new drive or should I just chill?

Thanks for any insights or shared experiences!

cabrio_leo · 30. Oktober 2019

Replace failing disk on ZFS pool?

davidknudsen · 11. November 2019

New drive arrived. 'zpool replace' worked without a hitch, even if the old drive was showing as FAULTED instead of OFFLINE.

For reference, this is what worked for me:

Code

/etc/init.d/zfs-zed stop
zpool replace data ata-ST12000NM0007-2A1101_ZJV2LDGN /dev/disk/by-id/ata-ST12000NM0007-2A1101_ZJV1T4YF
/etc/init.d/zfs-zed start

In summary: Just chill, everything will work out fine.

subzero79 · 11. November 2019

Zitat von davidknudsen

New drive arrived. 'zpool replace' worked without a hitch, even if the old drive was showing as FAULTED instead of OFFLINE.

In mirror vdevs you can detach one drive per vdev with the pool online Had also to take one drive for warranty and came back with the new one.

this was in proxmox
so

zpool detach poolname /dev/disk/by-id/ata-ST12000NM0007-2A1101_ZJV2LDGN

zpool attach poolname /dev/disk/by-id/ata-ST12000NM0007-2A1101_ZJV24C09 /dev/disk/by-id/ata-ST12000NM0007-2A1101_ZJV1T4YF

Then it goes resilvering

BTW those ironwolf seem to last only about 2-2 1/2 years only, i just returned two in the last 4 months that were purchased a week apart. IN this case just returned because smart starts showing increasing offline uncorrectable

warhawk8080 · 21. November 2019

I found this

Zitat

Replacing Disks in a ZFS Root Pool

You might need to replace a disk in the root pool for the following reasons:

The root pool is too small and you want to replace it with a larger disk
The root pool disk is failing. In a non-redundant pool, if the disk is failing and the system no longer boots, boot from another source such as a CD or the network. Then, replace the root pool disk.

You can replace disks by using one of two methods:

Using the zpool replace command. This method involves scrubbing and clearing the root pool of dirty time logs (DTLs), then replacing the disk. After the new disk is installed, you apply the boot blocks manually.
Using the zpool detach|attach commands. This method involves attaching the new disk and verifying that it is working properly, then detaching the faulty disk.

If you are replacing root pool disks that have the SMI (VTOC) label, ensure that you fulfill the following requirements:

Alles anzeigen

https://docs.oracle.com/cd/E53394_01/html/E54801/ghzvz.html

Physically connect the replacement disk.
Attach the new disk to the root pool. # zpool attach root-pool current-disk new-disk Where current-disk becomes old-disk to be detached at the end of this procedure. The correct disk labeling and the boot blocks are applied automatically. Note - If the disks have SMI (VTOC) labels, make sure that you include the slice when specifying the disk, such as c2t0d0s0.
View the root pool status to confirm that resilvering is complete. If resilvering has been completed, the output includes a message similar to the following: scan: resilvered 11.6G in 0h5m with 0 errors on Fri Jul 20 13:57:25 2014
Verify that you can boot successfully from the new disk.
After a successful boot, detach the old disk. # zpool detach root-pool old-disk Where old-disk is the current-disk of Step 2. Note - If the disks have SMI (VTOC) labels, make sure that you include the slice when specifying the disk, such as c2t0d0s0.
If the attached disk is larger than the existing disk, enable the ZFS autoexpand property. # zpool set autoexpand=on root-pool

Jetzt mitmachen!