Degraded ZFS not notifications

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Degraded ZFS not notifications

      I'm testing OMV with ZFS on one of my virtual machines.
      I set up reporting in OVM and check the ZFS ZED box.

      Running OMV: 4.1.5-1
      openmediavailt-zfs 4.0.2-1

      Testing the scenario when one drive fails.

      Source Code

      1. Start situation:
      2. root@openmediavault-testing:/etc# zpool status
      3. pool: TANK
      4. state: ONLINE
      5. scan: none requested
      6. config:
      7. NAME STATE READ WRITE CKSUM
      8. TANK ONLINE 0 0 0
      9. mirror-0 ONLINE 0 0 0
      10. ata-VMware_Virtual_SATA_Hard_Drive_01000000000000000001 ONLINE 0 0 0
      11. ata-VMware_Virtual_SATA_Hard_Drive_02000000000000000001 ONLINE 0 0 0
      12. mirror-1 ONLINE 0 0 0
      13. ata-VMware_Virtual_SATA_Hard_Drive_03000000000000000001 ONLINE 0 0 0
      14. ata-VMware_Virtual_SATA_Hard_Drive_04000000000000000001 ONLINE 0 0 0
      15. errors: No known data errors
      Display All
      Now I set one drive offline:

      Source Code

      1. root@openmediavault-testing:/etc# zpool offline TANK ata-VMware_Virtual_SATA_Hard_Drive_01000000000000000001

      This results in a degraded Zpool:

      Source Code

      1. root@openmediavault-testing:/etc# zpool status
      2. pool: TANK
      3. state: DEGRADED
      4. status: One or more devices has been taken offline by the administrator.
      5. Sufficient replicas exist for the pool to continue functioning in a
      6. degraded state.
      7. action: Online the device using 'zpool online' or replace the device with
      8. 'zpool replace'.
      9. scan: none requested
      10. config:
      11. NAME STATE READ WRITE CKSUM
      12. TANK DEGRADED 0 0 0
      13. mirror-0 DEGRADED 0 0 0
      14. ata-VMware_Virtual_SATA_Hard_Drive_01000000000000000001 OFFLINE 0 0 0
      15. ata-VMware_Virtual_SATA_Hard_Drive_02000000000000000001 ONLINE 0 0 0
      16. mirror-1 ONLINE 0 0 0
      17. ata-VMware_Virtual_SATA_Hard_Drive_03000000000000000001 ONLINE 0 0 0
      18. ata-VMware_Virtual_SATA_Hard_Drive_04000000000000000001 ONLINE 0 0 0
      19. errors: No known data errors
      Display All

      Now I don't get any notification by email, or in the GUI that there is anything wrong with my device.

      This does not look like expected behaviour to me.
    • The ZED email notification must be enabled manually. I wrote a small HowTo here in the forum to start a script depending on some ZED events. There is also explained how to activate the email notification. Please use the forum search function.
      OMV 3.0.90 (Gray style)
      ASRock Rack C2550D4I - 16GB ECC - 6x WD RED 3TB (ZFS 2x3 Striped RaidZ1)- Fractal Design Node 304
    • cabrio_leo wrote:

      The ZED email notification must be enabled manually.
      I enabled the ZED email notification in the notifcations window of OMV.

      cabrio_leo wrote:

      I wrote a small HowTo here in the forum to start a script depending on some ZED events. There is also explained how to activate the email notification.
      I've been coming trough some of your posts (damn there's a lot :P), but can't really find it. I checked the /etc/zfs/zed.d/zed.rc file wich is supposed to be "only thing you need to do" for my email address. That was all-ready there. Also i would guess the MTA settings are covered by OMV right? what am i missing ?

      I would expect it to function by just enabled since this version:
      [HOWTO] Instal ZFS-Plugin & use ZFS on OMV
    • @jammiejammie This is the thread I have meant:

      (HowTo) avoid Autoshutdown while a ZFS scub is running I have done this in OMV3, but not tested it with OMV4.

      After a zpool status you should get an email in your case.
      OMV 3.0.90 (Gray style)
      ASRock Rack C2550D4I - 16GB ECC - 6x WD RED 3TB (ZFS 2x3 Striped RaidZ1)- Fractal Design Node 304

      The post was edited 1 time, last by cabrio_leo ().

    • I wrote a small script zpool_online_check.sh which is executed once an hour to do a regular check of the pool. I found a template somewhere and it was modified by me.

      Shell-Script

      1. #!/bin/bash
      2. # zpool_online_check.sh
      3. result=`sudo zpool status -x`
      4. result2=`sudo zpool status -v`
      5. if [[ $result != 'all pools are healthy' ]]; then
      6. echo "Zpool online check: Something is wrong - CAUTION!"
      7. echo "Zpool online check: Something is wrong - CAUTION!" | logger -i -t zpool_online_check.sh
      8. #Do something here such as send an email, such as sending an email via HTTP...
      9. #/usr/bin/wget "http://example.com/send_email.php?subject=Alert&body=File%20System%20Has%20Problem" -O /dev/null > /dev/null
      10. echo " "
      11. echo "$result2"
      12. exit 1;
      13. fi
      Display All
      OMV 3.0.90 (Gray style)
      ASRock Rack C2550D4I - 16GB ECC - 6x WD RED 3TB (ZFS 2x3 Striped RaidZ1)- Fractal Design Node 304
    • @jammiejammie notifications are enabled for all the zed scripts (zedlets) that ship with the zfs package not the plugin.

      Look at the directory /etc/zfs/zed.d,

      If you want more notifications take a look at the zed manual and the examples in the directory mentioned above.

      manpages.ubuntu.com/manpages/xenial/man8/zed.8.html

      A basic test iirc would be to run a scrub and check if you receive the notification.

      If you create a zedlet that notifies on degraded pool feel free to submit a pr so we can send it with the plugin.
      New wiki
      chat support at #openmediavault@freenode IRC | Spanish & English | GMT+10
      telegram.me/openmediavault broadcast channel
      openmediavault discord server
    • Hi,


      I am also testing a little bit with ZFS on Openmediafault.
      However there are still a few things that confuses me. I setup a test Pool on my fresh installed OMV 4.


      Then I removed one hdd (Plugged out the SATA Cable) to see what happens. In the system Log I see that the Kernel report everything correctly

      Source Code

      1. Aug 7 15:13:43 openmediavault kernel: [ 1161.901550] ata4: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
      2. Aug 7 15:13:43 openmediavault kernel: [ 1161.901639] ata4: irq_stat 0x00400040, connection status changed
      3. Aug 7 15:13:43 openmediavault kernel: [ 1161.901700] ata4: SError: { HostInt PHYRdyChg 10B8B DevExch }
      4. Aug 7 15:13:43 openmediavault kernel: [ 1161.901761] ata4: hard resetting link
      5. Aug 7 15:13:44 openmediavault kernel: [ 1162.614716] ata4: SATA link down (SStatus 0 SControl 300)
      6. Aug 7 15:13:49 openmediavault kernel: [ 1167.842722] ata4: hard resetting link
      7. Aug 7 15:13:49 openmediavault kernel: [ 1168.156493] ata4: SATA link down (SStatus 0 SControl 300)
      8. Aug 7 15:13:54 openmediavault kernel: [ 1173.218707] ata4: hard resetting link
      9. Aug 7 15:13:55 openmediavault kernel: [ 1173.532250] ata4: SATA link down (SStatus 0 SControl 300)
      10. Aug 7 15:13:55 openmediavault kernel: [ 1173.532267] ata4.00: disabled
      11. Aug 7 15:13:55 openmediavault kernel: [ 1173.532291] ata4: EH complete
      12. Aug 7 15:13:55 openmediavault kernel: [ 1173.532308] ata4.00: detaching (SCSI 3:0:0:0)
      13. Aug 7 15:13:55 openmediavault kernel: [ 1173.537898] sd 3:0:0:0: [sdc] Stopping disk
      14. Aug 7 15:13:55 openmediavault kernel: [ 1173.537913] sd 3:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
      Display All


      However even 45 minutes later I still got a healthy pool requesting with


      Source Code

      1. root@openmediavault:~# zpool status -v LPS_Test
      2. pool: LPS_Test
      3. state: ONLINE
      4. scan: scrub repaired 0B in 0h0m with 0 errors on Tue Aug 7 14:34:53 2018
      5. config:
      6. NAME STATE READ WRITE CKSUM
      7. LPS_Test ONLINE 0 0 0
      8. mirror-0 ONLINE 0 0 0
      9. ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6ETCNP1 ONLINE 0 0 0
      10. ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7FP7R1T ONLINE 0 0 0
      11. errors: No known data errors
      Display All


      I could do a scrub which certainly would end in a degraded pool however I cannot scrub every day as this would permanently reduce the performance. Is there a basic quick check that would do it?
      I did not find anything in the ZFS manuals.
      And of course as a result no notification is sent.

      Any idea?


      Thanks
    • Hi again,

      so as expected after a scrub the disk is marked is offline

      Source Code

      1. root@openmediavault:/etc/zfs/zed.d# zpool status -v LPS_Test
      2. pool: LPS_Test
      3. state: DEGRADED
      4. status: One or more devices could not be used because the label is missing or
      5. invalid. Sufficient replicas exist for the pool to continue
      6. functioning in a degraded state.
      7. action: Replace the device using 'zpool replace'.
      8. see: http://zfsonlinux.org/msg/ZFS-8000-4J
      9. scan: scrub canceled on Tue Aug 7 16:10:33 2018
      10. config:
      11. NAME STATE READ WRITE CKSUM
      12. LPS_Test DEGRADED 0 0 0
      13. mirror-0 DEGRADED 0 0 0
      14. ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6ETCNP1 ONLINE 0 0 0
      15. ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7FP7R1T UNAVAIL 0 0 0 corrupted data
      16. errors: No known data errors
      17. root@openmediavault:/etc/zfs/zed.d#
      Display All


      I expected an notification now. I have switched on the ZFS_ZED under Notifications in the WEGGui. I also have modified the zed.rc


      Source Code

      1. ZED_EMAIL_ADDR=al**************
      2. ZED_NOTIFY_INTERVAL_SECS=3600
      3. ZED_NOTIFY_VERBOSE=1

      I also found the script statechange-notify.sh in /etc/zfs/zed.d which should do the notification (as far as I understood drom the manual)

      So to summarize:

      - No automatic update of the pool - only a manually triggered scrub could change it
      - No notification after the degradation of the Pool even though everything should be setup.

      I am pretty convinced this is not how it should be :)

      Thanks

      S