Ryzen 7 Build - OMV 4 + SnapRAID + Union FS + Docker ...it's not happy...

  • Hey,


    So I've just built a system, specs as follows:


    - AMD Ryzen 7 2700X
    - Gigabyte AB350M-D3H
    - 16GB DDR4 3600Mhz (2x8GB)
    - Samsung 250GB 970EVO NVMe
    - Samsung 250GB 860EVO SSD
    - 3x Seagate IronWolf 10TB HDD's
    - LSI 9300-8i HBA (with 2x SAS-To-SATA 4 Port Cables)


    Initially i tried using the pre-built OMV ISO, and while it did work, it didn't seem to be able to see the NVMe beyond the installation stage.


    After some googling, decided to go with installing Debian 9 Minimal + OMV On-Top, which worked, although problem number 2. The NIC was causing errors during the install, essentially discovered i needed to install the realtek-firmware package for rtl8168 NIC that the board had on it, attempt 3, installed that directly *after* completing the debian install, then did the OMV install and was much more successful.


    fast-forward a couple hours, got everything setup, install the omv-extras, installed the Docker GUI plugin, SnapRAID plugin, and the Union Filesystem plugin.


    Got that setup, seemingly working with SMB share to the unionfs drive etc, able to read/write ok, all seems well.... noticed a message in syslog complaining about kvm not having the virtualisation enabled in the bios, didn't think much of it as i wasn't using or planning to use virtualisation.


    skip forward to the next day, decide might as well try sort out that error, enabled the SVM & IOMMU settings in the BIOS, after initially not being able to boot, rebuilt the initrd and was able to, again up and running.


    Noticed some new errors in the logs, segfaults, and eventually ended with some even more serious issues where the CPU was causing soft hangs/hard hangs and kernel issues, at that point decided it was probably nicer to just have the KVM error than that kind of activity.


    So reverted the settings, disabled SVM & IOMMU.


    Some of the errors in syslog i've seen are:


    Jun 3 10:50:46 omv kernel: [ 1346.507220] omv-engined[7186]: segfault at 7b0bd8ef82a9 ip 000055a85ae045b5 sp 00007ffe8ba0efe0 error 4 in php7.0[55a85ab97000+3aa000]
    Jun 3 10:50:52 omv kernel: [ 1352.505027] omv-engined[7209]: segfault at 7b0bd8ef82a9 ip 000055a85ae045b5 sp 00007ffe8ba0efe0 error 4 in php7.0[55a85ab97000+3aa000]
    Jun 3 10:51:03 omv kernel: [ 1363.496237] omv-engined[7231]: segfault at 7b0bd8ef82a9 ip 000055a85ae045b5 sp 00007ffe8ba0efe0 error 4 in php7.0[55a85ab97000+3aa000]
    Jun 3 10:51:08 omv kernel: [ 1368.495003] omv-engined[7257]: segfault at 7b0bd8ef82a9 ip 000055a85ae045b5 sp 00007ffe8ba0efe0 error 4 in php7.0[55a85ab97000+3aa000]


    and;


    Jun 3 10:42:19 omv kernel: [ 839.541420] BUG: unable to handle kernel paging request at 00000001000000fc
    Jun 3 10:42:19 omv kernel: [ 839.541439] IP: userfaultfd_unmap_prep+0x3a/0x110
    Jun 3 10:42:19 omv kernel: [ 839.541445] PGD 0 P4D 0
    Jun 3 10:42:19 omv kernel: [ 839.541452] Oops: 0000 [#1] SMP NOPTI
    Jun 3 10:42:19 omv kernel: [ 839.541457] Modules linked in: uas usb_storage softdog xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay snd_hda_codec_hdmi cpufreq_powersave cpufreq_conservative cpufreq_userspace edac_mce_amd snd_hda_codec_realtek nls_ascii snd_hda_codec_generic nls_cp437 kvm_amd nouveau vfat mxm_wmi fat snd_hda_intel video kvm snd_hda_codec irqbypass snd_hda_core ttm snd_hwdep efi_pstore crct10dif_pclmul crc32_pclmul drm_kms_helper snd_pcm wmi_bmof ppdev ghash_clmulni_intel efivars pcspkr evdev joydev snd_timer k10temp drm sp5100_tco snd i2c_algo_bit soundcore ccp rng_core sg fuse shpchp wmi parport_pc
    Jun 3 10:42:19 omv kernel: [ 839.541530] parport button acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic sd_mod usbhid hid crc32c_intel aesni_intel aes_x86_64 ahci xhci_pci crypto_simd libahci mpt3sas cryptd glue_helper xhci_hcd raid_class libata scsi_transport_sas i2c_piix4 nvme r8169 usbcore mii scsi_mod usb_common nvme_core gpio_amdpt gpio_generic
    Jun 3 10:42:19 omv kernel: [ 839.541586] CPU: 0 PID: 4637 Comm: omv-engined Not tainted 4.16.0-0.bpo.1-amd64 #1 Debian 4.16.5-1~bpo9+1
    Jun 3 10:42:19 omv kernel: [ 839.541593] Hardware name: Gigabyte Technology Co., Ltd. AB350M-D3H/AB350M-D3H-CF, BIOS F23d 04/17/2018
    Jun 3 10:42:19 omv kernel: [ 839.541601] RIP: 0010:userfaultfd_unmap_prep+0x3a/0x110
    Jun 3 10:42:19 omv kernel: [ 839.541606] RSP: 0018:ffffc0994212fe50 EFLAGS: 00010206
    Jun 3 10:42:19 omv kernel: [ 839.541611] RAX: ffff9d293cf55040 RBX: 0000000100000090 RCX: ffffc0994212fee0
    Jun 3 10:42:19 omv kernel: [ 839.541617] RDX: 00007f0bd6661000 RSI: 00007f0bd642e000 RDI: ffff9d293cf55ba0
    Jun 3 10:42:19 omv kernel: [ 839.541622] RBP: ffffc0994212fee0 R08: ffff9d293b16e740 R09: 0000000000000000
    Jun 3 10:42:19 omv kernel: [ 839.541628] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d29328a16c0
    Jun 3 10:42:19 omv kernel: [ 839.541633] R13: 00007f0bd6661000 R14: 00007f0bd642e000 R15: 00007f0bd642e000
    Jun 3 10:42:19 omv kernel: [ 839.542234] FS: 00007f0bdd9d9b80(0000) GS:ffff9d299ec00000(0000) knlGS:0000000000000000
    Jun 3 10:42:19 omv kernel: [ 839.543271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 3 10:42:19 omv kernel: [ 839.544304] CR2: 00000001000000fc CR3: 00000003b935c000 CR4: 00000000003406f0
    Jun 3 10:42:19 omv kernel: [ 839.545344] Call Trace:
    Jun 3 10:42:19 omv kernel: [ 839.546373] ? kmem_cache_free+0x19c/0x1d0
    Jun 3 10:42:19 omv kernel: [ 839.547400] do_munmap+0x42e/0x460
    Jun 3 10:42:19 omv kernel: [ 839.548426] vm_munmap+0x66/0xa0
    Jun 3 10:42:19 omv kernel: [ 839.549454] SyS_munmap+0x1d/0x30
    Jun 3 10:42:19 omv kernel: [ 839.550468] do_syscall_64+0x6c/0x130
    Jun 3 10:42:19 omv kernel: [ 839.551485] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    Jun 3 10:42:19 omv kernel: [ 839.552497] RIP: 0033:0x7f0bdd7e1447
    Jun 3 10:42:19 omv kernel: [ 839.553500] RSP: 002b:00007ffe8ba11518 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
    Jun 3 10:42:19 omv kernel: [ 839.554512] RAX: ffffffffffffffda RBX: 000055a85beae240 RCX: 00007f0bdd7e1447
    Jun 3 10:42:19 omv kernel: [ 839.555540] RDX: 0000000000000002 RSI: 00000000002321f8 RDI: 00007f0bd642e000
    Jun 3 10:42:19 omv kernel: [ 839.556563] RBP: 00007ffe8ba11870 R08: 000055a85beb23c0 R09: 0000000000000000
    Jun 3 10:42:19 omv kernel: [ 839.557585] R10: 0000000000000004 R11: 0000000000000202 R12: 00007f0bdd9ec040
    Jun 3 10:42:19 omv kernel: [ 839.558607] R13: 00007ffe8ba11794 R14: 00007ffe8ba11660 R15: 000055a85beae240
    Jun 3 10:42:19 omv kernel: [ 839.559638] Code: 57 41 56 49 89 f6 41 55 41 54 49 89 d5 55 53 49 89 fc 48 89 cd 48 83 ec 08 48 3b 17 76 50 49 8b 9c 24 c8 00 00 00 48 85 db 74 33 <f6> 43 6c 40 74 2d 48 8b 55 00 48 39 d5 48 8d 42 e8 75 0f eb 3f
    Jun 3 10:42:19 omv kernel: [ 839.560729] RIP: userfaultfd_unmap_prep+0x3a/0x110 RSP: ffffc0994212fe50
    Jun 3 10:42:19 omv kernel: [ 839.561821] CR2: 00000001000000fc
    Jun 3 10:42:19 omv kernel: [ 839.562888] ---[ end trace 259f38065b33bbb7 ]---


    Just wondering if this kind of thing is expected, and par for the course, or if this is a sign of incompatible hardware, in which case i may need to move to a different platform?


    Obviously i want the system stable, the reason for such high-end spec is i wanted Plex to be able to easily handle transcoding whilst other things are going on in the background etc, but so far im a bit concerned of the stability of the system.


    Look forward to any insights.


    Cheers,

    • Offizieller Beitrag

    This can be a problem with bleeding edge hardware. What kernel is installed?

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

    • Offizieller Beitrag

    - I don't know if this will help but, I'd do a clean rebuild with the new BIOS (KVM) settings. The install process (Debian) has a number of scripts that are designed to "sense" the state of existing hardware. It appears that you've altered the hardware profile with BIOS changes, after your build was complete. This might be the reason behind CPU hangs, etc.


    - If the option is available, it might be better to set your Mobo for legacy BIOS versus UEFI. Also, select MBR over GPT if available. (If either of these options are available, I'd try to use the OMV install ISO again.)


    - Search the forum and the net on using an NVMe to boot. (Terms: boot, NVMe, Debian, openmediavault, etc.) NMVe booting issues have cropped up before. You may be running into Grub issues.


    - Just for experimentation purposes, I'd try a build using the OMV ISO and install to a USB stick 16GB or larger. OMV will boot from a USB stick and it works well. (You could try installing to the SSD as well.)


    (Note, starting tomorrow, that I'm out of town until next weekend so I won't be able to respond.)

  • This can be a problem with bleeding edge hardware. What kernel is installed?

    So it's running 4.16.0-0.bpo.1-amd64

    So as mentioned, i've reverted the bios settings to how they were when i did the install, so far... i've not seen anything odd, but i have really only left it idling along, so time will tell. -- I did think after that wasn't a good idea, but stupid Gigabyte thought it was a good idea to ship it with virt disabled, and hide all the settings in a sub-sub-menu named something ambiguous that you wouldn't suspect.... so i live and learn on that one.


    I don't want to go through a re-install if i don't have to, purely because i've already spent the better part of a day in setting everything up, starting again while it may be faster to do second time around, is still...well i'd rather not :)


    The Debian installer i used was the netinstall copied to a USB via Rufus, worked fine, but positive it did the UEFI install.


    So i found with the OMV ISO i could install fine to the SSD (which is my plan) but once installed, the NVMe ceases to be seen when looking at the disks available (although it is visible as an install destination... )


    So you reckon i'd be worth trying again and installing it to a USB stick just to see if i get the same bugs? (although i'd need to probably still setup a few things first, as the bugs seemed to present themselfs mostly when im clicking around in the plugins in the UI.


    The CPU Hang i believe happened when i tried deleting a folder through an SMB share of all things (although hasn't happened since, following reverting the bios settings)


    Cheers for replying btw!

    • Offizieller Beitrag

    I'd try, KVM ON (in BIOS), UEFI and GPT off in BIOS (if possible) and install to a USB stick. The build will take 15 to 20 minutes and you can leave your current install in place - just set your boot order for the USB stick OR disconnect the NMVe or SSD.


    This test build should indicate if the install will behave properly, without having to start from scratch.
    ___________________________________________________________


    (BTW: in OMV's standard update window, there's a firmware update for Realtek NIC's. Don't know if it's the one you need.)


    @ryecoaaron is right about your bleeding edge hardware. Being an OMV developer - he's the expert on that "stuff". :)
    The Linux world needs 6 months to a year for even the latest kernels to be adapted to cutting edge processors, mobo's, Nic's, etc. (And it can take longer.)

    • Offizieller Beitrag

    I don't know what your preferences are but, if you can have your boot device external to your server (USB3 stick, SSD to USB3 interface, etc.) backing up or cloning your OS build is a breeze. With that kind of backup, a clone, restorations can be done in two minutes. Simply swap out the boot device.


    Essentially, you can try a new build (a full upgrade, new package install, new firmware, test a Docker, etc.) without consequence because you'd be able to revert a known working configuration. It bears giving it some thought.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!