General protection fault, due to ZFS?

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • General protection fault, due to ZFS?

      Overnight for the past few nights my OMV has started to hang. I believe it's related to ZFS but I don't really know how to troubleshoot this specific issue.

      Here's the log:

      Source Code

      1. Apr 21 11:57:24 HMS kernel: [12356.061733] general protection fault: 0000 [#1] SMP PTI
      2. Apr 21 11:57:24 HMS kernel: [12356.061749] Modules linked in: softdog xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack aufs overlay intel_powerclamp coretemp rtl8xxxu kvm_intel arc4 zfs(PO) kvm irqbypass zunicode(PO) zavl(PO) icp(PO) snd_hda_codec_hdmi rtl8192cu nouveau rtl_usb rtl8192c_common rtlwifi mxm_wmi video intel_cstate ttm mac80211 snd_hda_codec_realtek drm_kms_helper snd_hda_codec_generic pcspkr serio_raw drm fb_sys_fops snd_hda_intel syscopyarea input_leds sysfillrect joydev sysimgblt snd_hda_codec cfg80211 nvidiafb snd_hda_core vgastate snd_hwdep fb_ddc snd_pcm i2c_algo_bit snd_timer lpc_ich snd soundcore i7core_edac shpchp asus_atk0110 i5500_temp
      3. Apr 21 11:57:24 HMS kernel: [12356.061859] wmi mac_hid zcommon(PO) znvpair(PO) spl(O) sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear usbkbd hid_generic usbmouse usbhid hid psmouse i2c_i801 firewire_ohci firewire_core crc_itu_t sky2 ahci libahci
      4. Apr 21 11:57:24 HMS kernel: [12356.061911] CPU: 2 PID: 840 Comm: z_wr_int_7 Tainted: P IO 4.15.18-12-pve #1
      5. Apr 21 11:57:24 HMS kernel: [12356.061920] Hardware name: System manufacturer System Product Name/P6X58D PREMIUM, BIOS 1501 05/10/2011
      6. Apr 21 11:57:24 HMS kernel: [12356.061998] RIP: 0010:zio_remove_child+0x74/0x140 [zfs]
      7. Apr 21 11:57:24 HMS kernel: [12356.062005] RSP: 0018:ffffbac6c4a0fcf8 EFLAGS: 00010282
      8. Apr 21 11:57:24 HMS kernel: [12356.062013] RAX: ffff8fb2ff9ab110 RBX: ffff8fb3f4796d80 RCX: ffff8fb2ff9ab050
      9. Apr 21 11:57:24 HMS kernel: [12356.062021] RDX: ff7f8fb2ff9abbf0 RSI: ffff8fb3f4796d80 RDI: ffff8fb3f4797590
      10. Apr 21 11:57:24 HMS kernel: [12356.062030] RBP: ffffbac6c4a0fd28 R08: 0000000000200000 R09: ffffbac6c4a0fd68
      11. Apr 21 11:57:24 HMS kernel: [12356.062038] R10: ffffbac6c4a0fcc0 R11: 0000000000000200 R12: ffff8fb3f4797210
      12. Apr 21 11:57:24 HMS kernel: [12356.062049] R13: ffff8fb2ff9ab0f0 R14: ffff8fb3f4797100 R15: ffff8fb3f4797590
      13. Apr 21 11:57:24 HMS kernel: [12356.062059] FS: 0000000000000000(0000) GS:ffff8fb4a9280000(0000) knlGS:0000000000000000
      14. Apr 21 11:57:24 HMS kernel: [12356.062069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      15. Apr 21 11:57:24 HMS kernel: [12356.062077] CR2: 000000c42016c000 CR3: 00000002d640a000 CR4: 00000000000006e0
      16. Apr 21 11:57:24 HMS kernel: [12356.062086] Call Trace:
      17. Apr 21 11:57:24 HMS kernel: [12356.062152] zio_done+0x3ae/0xe60 [zfs]
      18. Apr 21 11:57:24 HMS kernel: [12356.062219] zio_execute+0x95/0xf0 [zfs]
      19. Apr 21 11:57:24 HMS kernel: [12356.062234] taskq_thread+0x2ae/0x4d0 [spl]
      20. Apr 21 11:57:24 HMS kernel: [12356.062246] ? wake_up_q+0x80/0x80
      21. Apr 21 11:57:24 HMS kernel: [12356.062311] ? zio_reexecute+0x390/0x390 [zfs]
      22. Apr 21 11:57:24 HMS kernel: [12356.062321] kthread+0x105/0x140
      23. Apr 21 11:57:24 HMS kernel: [12356.062333] ? taskq_thread_should_stop+0x70/0x70 [spl]
      24. Apr 21 11:57:24 HMS kernel: [12356.062343] ? kthread_create_worker_on_cpu+0x70/0x70
      25. Apr 21 11:57:24 HMS kernel: [12356.062353] ret_from_fork+0x35/0x40
      26. Apr 21 11:57:24 HMS kernel: [12356.062360] Code: 48 89 45 d0 e8 5e 59 c0 e8 48 8b 45 d0 49 89 84 24 a8 03 00 00 4c 89 e8 49 03 84 24 18 01 00 00 48 8b 08 48 8b 50 08 48 89 51 08 <48> 89 0a 48 b9 00 01 00 00 00 00 ad de 48 ba 00 02 00 00 00 00
      27. Apr 21 11:57:24 HMS kernel: [12356.062474] RIP: zio_remove_child+0x74/0x140 [zfs] RSP: ffffbac6c4a0fcf8
      28. Apr 21 11:57:24 HMS kernel: [12356.062511] ---[ end trace 65fa30fb7d3fded2 ]---
      Display All
    • getName() wrote:

      Looks like the zfs kernel module that spawns the workers does something bad on removeing them.
      This may be a serious bug, or maybe just a faulty memstick. Do you run ecc? Are you running the latest version?
      It could definitely be related to the memory as I recently installed 2 new sticks. I checked the BIOS and the frequency was off since it was using the X.M.P.P settings and I don't really need to OC the ram on my home server - that's what I get for making my server out my old gaming parts.

      The board doesn't support ECC so I'm not able to use that. I'll try to adjust the timings and see what's going on with that. My OMV is up to date.

      I'm running memtest right now to make sure that's all fine.
    • getName() wrote:

      Dont use the xmp profiles, just use the jedec specifications that are also given by the memory. At least for servers and 24/7 machines. You can run some memtests, altough I would expect other failures too, if the memory is the cause.
      Edit: Ah, I see, you are already running memtest.
      Yeah, the board was using XMPP by default since I never cleared it - this was my old gaming PC. Currently Memtest s hows no errors @ 90% complete. I'll know more tomorrow when the server has been running overnight if I get the error.

      Is there any other logs I could look at, if it fails again, which might get more information?