Posts by chris_kmn

    I tried some more things and changed one setting that I have done to overcome issues with nvidia hw transcoding.


    in grub I changed the following kernel parameter:


    GRUB_CMDLINE_LINUX_DEFAULT="text acpi_osi=! \"acpi_osi=Windows 2015\" mem_sleep_default=deep pcie_aspm=off"


    As far as I understand is, that this is telling Linux it is on a windows system.



    After I changed it back to:


    GRUB_CMDLINE_LINUX_DEFAULT="text mem_sleep_default=deep"



    Everything seems to work. Autoshutdown service is active and HW transcoding is operative.....


    Does this make sense to you ?

    I don`t know how to tell....... But after suspend - and now with running autoshutdown - my hardware transcoding isn't working anymore...


    so my guess is, that there IS a conflict with the nvidia scripts from #43 :-( :-( :-(


    nvidia-smi isnt working anymore also. And to get it back working I have to reboot my system :-(

    Code
    root@MalibuNAS:~# nvidia-smi
    Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error



    can't we go back to the original version (5.1.7 ?) ?

    I'll give it a try. But I already removed the plugin as you described, started from scratch - buth without purge. But after removing it again the result is:



    BUT:


    This time it seems to work.... and one more but: meanwhile I've changed my network setup from DHCP to static and also disabled the ipv6 support of my Plex server that is running on my OMV.


    This is the log:

    There might be a (major) difference in my setup. I am using nvidia hardware transcoding and therefore I had to install specific nvidia sleep and resume scripts. may be that could help to find the reason ?!


    /etc/systemd/system/nvidia-hibernate.service:



    /etc/systemd/system/nvidia-resume.service:


    /etc/systemd/system/nvidia-suspend.service:


    /usr/bin/nvidia-sleep.sh:


    and /lib/systemd/system-sleep/nvidia:

    Bash
    #!/bin/sh
    case "$1" in
    post)
    /usr/bin/nvidia-sleep.sh "resume"
    ;;
    esac



    does this make any sense ?

    I tried it, service doesn't start up after suspend:


    well... I think I misunderstood the problem with the path. I thought Dagobert did the mistake and did not realise that it is a mistake in #26.


    So I copied to the correct path and gave exec rights, suspend-resume and still error, but this time with an error in the log:


    and this is the journal log:



    but I will repeat everything one more time...

    Path is correct and I also gave exec rights to the files.


    after I've activated verbose and deleted the old log, the logfile was empty after restart from suspend (?), service was inactive.


    dmesg log shows interface is up after about 5 secs:


    Interesting. After I deleted the spaces like you advised in FORCE_NIC I‘m having the problem again.


    cat...openstate for my enp2s0 says „up“. So the interface should be working after resume.


    This is the log output:


    It is quite tricky to get Plex hardware transcoding working on OMV in a docker container with nvidia graphics cards.

    It took me some weeks to find out all the little details and I want to share it with you.


    My configuration is tested for:

    - Debian 10 Buster (wich is OMV5)

    - OMV 5.6.1-1 Usul

    - Kernel Linux 5.10.0-0.bpo.3-amd64

    - Docker 5:20.10.5~3-0~debian-buster

    - Portainer 2.1.1

    - Nvidia driver 460.39

    - Cuda Version 11.2

    - Nvidia Quadro P1000 graphics card

    - Plex 1.22.0.4145


    First of all and very important:

    Hardware transcoding must work on the intel hardware ! There are several tutorials for that. Mainly it is related to access rights like the "noexex" to "exec" option in the fstab.


    Step 1:

    if you already tried to install nvidia driver: purge it out !

    Code
    apt-get purge *nvidia*
    apt autoremove
    apt autoclean


    Step 2:

    prepare your header files. Very important !

    Code
    apt-get install module-assistant
    sudo m-a prepare



    Step 3: installing Nvidia Driver

    follow the instructions given by Nvidia:


    https://wiki.debian.org/Nvidia…28via_buster-backports.29



    in detail:


    Add buster-backports to your /etc/apt/sources.list , for example:

    Code
    #Busterter-backports
    deb http://deb.debian.org/debian buster-backports main contrib non-free


    then:


    Code
    apt update
    apt install -t buster-backports nvidia-driver firmware-misc-nonfree


    IMPORTANT:


    watch the messages during intallation. It there are any error messages something is wrong and you might not have debian buster with backports ! You have to solve this issue first before proceeding !!!



    now we have to configure nvidia:

    Code
    apt install -t buster-backports nvidia-xconfig
    sudo nvidia-xconfig


    Since Docker 5.20.10.2 (I think) there was a change how docker gets access to hardware via cgroups. You need this workaround in the kernel boot parameters:


    Code
    echo 'GRUB_CMDLINE_LINUX=systemd.unified_cgroup_hierarchy=false' > /etc/default/grub.d/cgroup.cfg
    update-grub


    now reboot and the nvidia driver should already work.


    Step 4: Install Nvidia container toolkit

    Follow the Installation guide by Nvidia:

    https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit


    in detail:


    Install curl if you don't have it already:

    Code
    sudo apt install curl

    Setup the stable repository and the GPG key:


    Code
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list


    Install the <span style="font-size: 10pt;">nvidia-docker2</span> package (and dependencies) after updating the package listing:

    Code
    sudo apt-get update
    apt install -t buster-backports nvidia-docker2


    Now install Nvidia encode library and nvidia-smi:

    Code
    apt install -t buster-backports libnvidia-encode1
    apt install -t buster-backports nvidia-smi


    Step 5: install Nvidia container runtime:

    Code
    apt install -t buster-backports nvidia-container-runtime


    Step 6: some modifications:

    Change/edit the Daemon configuration file

    /etc/docker/daemon.json :


    Code
    {
    "runtimes": {
    "nvidia": {
    "path": "/usr/bin/nvidia-container-runtime",
    "runtimeArgs": []
    }
    },
    "default-runtime": "nvidia",
    "data-root": "/var/lib/docker"
    }


    and the

    /etc/nvidia-container-runtime/config.toml

    to:



    Restart docker:

    Code
    sudo systemctl restart docker



    Step 7: starting Plex (in Portainer)

    in Portainer you have to add the following parameters in the "Env" tab:

    name:




    NVIDIA_DRIVER_CAPABILITIES




    Value




    compute,video,utility




    name




    NVIDIA_VISIBLE_DEVICES




    Value




    all




    and in the tab "Runtime & Ressources":

    change the "Runtime" Value from runc to nvidia !!!


    No Privileged mode and no Init set.




    Step 8: try and error:

    this is how it worked for me. If you want to check operation you can display GPU load with:


    Code
    watch -d -n 0.5 nvidia-smi


    or install:


    Code
    apt install nvtop


    and use:


    Code
    nvtop



    You can also try/use the docker command line interface (cli) to start Plex:





    Step 9: get rid of the session limit:

    If you want to disable the session limit (my P1000 had a limit of 3) go ahead with this link. It worked for me also:

    https://github.com/keylase/nvidia-patch


    in detail:


    if you don't have git, install it:


    Code
    apt git install

    then

    Code
    git clone https://github.com/keylase/nvidia-patch.git nvidia-patch


    Patch the nvidia driver:

    Code
    cd nvidia-patch
    bash ./patch.sh


    If you want to rollback:

    Code
    bash ./patch.sh -r


    Step 10: Nvidia Power Management:

    You need these modifications if you are using autoshutdown as the nvidia driver (or the pci-bus of the card?) is falling of if restarting from hibernate or suspend mode !

    You can read more about that in the nvidia documentation:

    http://us.download.nvidia.com/…namicpowermanagement.html

    http://us.download.nvidia.com/…ADME/powermanagement.html

    http://us.download.nvidia.com/…/nvidia-persistenced.html



    First of all you need the dedicated nvidia scripts for power management and you have to find them in the nvidia driver install package:


    Download the nvidia install package from:

    https://www.nvidia.com/de-de/drivers/unix/


    or for the 460 driver the direct link:

    https://www.nvidia.de/Download/driverResults.aspx/170214/de


    Or on the system:

    Code
    wget http://us.download.nvidia.com/XFree86/Linux-x86_64/460.39/NVIDIA-Linux-x86_64-460.39.run


    Now do not install the driver (!) - just extract it:



    Code
    sh NVIDIA-Linux-x86_64-460.39.run --extract-only



    In the next step you have to search for the following files and copy them to the given directories (I used an ssh client for this)


    Code
    /etc/systemd/system/nvidia-suspend.service
    /etc/systemd/system/nvidia-hibernate.service
    /etc/systemd/system/nvidia-resume.service
    /lib/systemd/system-sleep/nvidia
    /usr/bin/nvidia-sleep.sh



    then enable the services:


    Code
    sudo systemctl enable nvidia-suspend.service
    sudo systemctl enable nvidia-hibernate.service
    sudo systemctl enable nvidia-resume.service


    change nvidia kernel config:


    /etc/modprobe.d/nvidia-kernel-common.conf

    Code
    options nvidia NVreg_PreserveVideoMemoryAllocations=1
    options nvidia NVreg_DynamicPowerManagementVideoMemoryThreshold=100
    options nvidia NVreg_DynamicPowerManagement=0x02
    options nvidia NVreg_EnableMSI=0


    now make nvidia-sleep.sh executable and update modules:


    Code
    chmod a+x /usr/bin/nvidia-sleep.sh
    sudo update-initramfs -u


    Remark:

    I still use the script files from the 450 driver version and they still work with 460. So I think they are more general and not driver specific. But if you experience problems

    with hardware transcoding on a new driver version after a resume maybe you have to extract them fresh from the new driver package.




    And I don't know if this final part is really needed, but if you still have issues with resume from suspend you can try to disable active state power management on pcie:


    Change/add kernel parameter pcie_aspm=off in /etc/default/grub to:

    Code
    GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off"


    and after changing grub:


    Code
    sudo update-grub
    reboot



    That was a long way :-)


    Please let me know if there are still issues with htis guide. I'll try to keep it up to date...


    Good luck,


    Chris

    for me it still works. No error on startup after suspend:



    But I also updated autoshutdown to 5.1.10 wich was available yesterday evening...

    Hello Everybody,


    I have the same problem, but I did not check anything in my config. It occured after an update of OMV-Autoshutdown few days ago.


    After every resume from suspend the autoshutdown-service is in error state:


    Code
    root: INFO: '_check_networkconfig(): Finding available network interfaces and their IPv4 addresses'
    root: ERR: '_check_networkconfig(): No valid network interface found, exiting ...'


    Neither the Option FORCE_NIC="enp2s0" solved the problem nor the change from #8 in autoshutdown:

    local -r net_ifaces="${FORCE_NIC:-"en,eth,wlan,bond,usb,br"}"


    Also a reinstall of OMV-Autoshutdown (5.1.9) did not help.


    So autoshutdown is only working once and then in error state.


    Anyone else having the issue ?