[HowTo] Nvidia hardware transcoding on OMV 5 in a Plex docker container

chris_kmn · 13. Februar 2021

For OMV 6 proceed to the second post: OMV 6

It is quite tricky to get Plex hardware transcoding working on OMV in a docker container with nvidia graphics cards.

It took me some weeks to find out all the little details and I want to share it with you.

My configuration is tested for:

- Debian 10 Buster (wich is OMV5)

- OMV 5.6.13-1 Usul

- Kernel Linux 5.10.0-0.bpo.8-amd64

- Docker 5:20.10.8~3-0~debian-buster

- Portainer 2.6.0

- Nvidia driver 460.73.01

- Cuda Version 11.2

- Nvidia Quadro P1000 graphics card

- Plex 1.24.0.4897

First of all and very important:

Hardware transcoding must work on the intel hardware ! There are several tutorials for that. Mainly it is related to access rights like the "noexex" to "exec" option in the fstab.

Step 1:

if you already tried to install nvidia driver: purge it out !

Code

apt-get purge *nvidia*

apt autoremove

apt autoclean

Step 2:

prepare your header files. Very important !

Code

apt-get install module-assistant

sudo m-a prepare

Step 3: installing Nvidia Driver

follow the instructions given by Nvidia:

NvidiaGraphicsDrivers - Debian Wiki

in detail:

Add buster-backports to your /etc/apt/sources.list , for example:

Code

#Busterter-backports
deb http://deb.debian.org/debian buster-backports main contrib non-free

then:

Code

apt update
apt install -t buster-backports nvidia-driver firmware-misc-nonfree

IMPORTANT:

watch the messages during intallation. It there are any error messages something is wrong and you might not have debian buster with backports ! You have to solve this issue first before proceeding !!!

now we have to configure nvidia:

Code

apt install -t buster-backports nvidia-xconfig

sudo nvidia-xconfig

Since Docker 5.20.10.2 (I think) there was a change how docker gets access to hardware via cgroups. You need this workaround in the kernel boot parameters:

Code

echo 'GRUB_CMDLINE_LINUX=systemd.unified_cgroup_hierarchy=false' > /etc/default/grub.d/cgroup.cfg

update-grub

now reboot and the nvidia driver should already work.

Step 4: Install Nvidia container toolkit

Follow the Installation guide by Nvidia:

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit

in detail:

Install curl if you don't have it already:

Code

sudo apt install curl

Setup the stable repository and the GPG key - all in one command! (use copy button):

if this doesn't work please check for any changes under this link: nvidia container

Code

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the nvidia-docker2 package (and dependencies) after updating the package listing:

Code

sudo apt-get update

apt install -t buster-backports nvidia-docker2

Now install Nvidia encode library and nvidia-smi:

Code

apt install -t buster-backports libnvidia-encode1

apt install -t buster-backports nvidia-smi

Step 5: install Nvidia container runtime:

Code

apt install -t buster-backports nvidia-container-runtime

Step 6: some modifications:

Change/edit the Daemon configuration file

/etc/docker/daemon.json :

Code

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia",
    "data-root": "/var/lib/docker"
}

and the

/etc/nvidia-container-runtime/config.toml

to:

Code

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
#ldconfig = "@/sbin/ldconfig"
#ldconfig = "/sbin/ldconfig"
ldconfig = "/sbin/ldconfig.real"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"

Alles anzeigen

Restart docker:

Code

sudo systemctl restart docker

Step 7: starting Plex (in Portainer)

in Portainer you have to add the following parameters in the "Env" tab:

name:

NVIDIA_DRIVER_CAPABILITIES

Value

compute,video,utility

name

NVIDIA_VISIBLE_DEVICES

Value

all

and in the tab "Runtime & Ressources":

change the "Runtime" Value from runc to nvidia !!!

No Privileged mode and no Init set.

Step 8: try and error:

this is how it worked for me. If you want to check operation you can display GPU load with:

Code

watch -d -n 0.5 nvidia-smi

(If nvidia-smi is not installed, do apt install -t buster-backports nvidia-smi)“

or install:

Code

apt install nvtop

and use:

Code

nvtop

You can also try/use the docker command line interface (cli) to start Plex:

Code

docker run \
-d \
--name=plex-gpu \
--net=host \
--gpus all \
-device /dev/dri:/dev/dri \
-e PUID=xxxxxx \                    #change to your value
-e PGID=yyyyyy \                    #change to your value
-e PLEX_MEDIA_SERVER_USER=plex \
-e VERSION="plexpass" \
-v your volumes \                   #map your volumes
-v your volumes \                   #map your volumes
--restart unless-stopped \
ghcr.io/linuxserver/plex:latest

Alles anzeigen

Step 9: get rid of the session limit:

If you want to disable the session limit (my P1000 had a limit of 3) go ahead with this link. It worked for me also:

GitHub - keylase/nvidia-patch: This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.

This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs. - GitHub -…

github.com

in detail:

if you don't have git, install it:

Code

apt install git

then

Code

git clone https://github.com/keylase/nvidia-patch.git nvidia-patch

Patch the nvidia driver:

Code

cd nvidia-patch
bash ./patch.sh

If you want to rollback:

Code

bash ./patch.sh -r

Step 10: Nvidia Power Management:

You need these modifications if you are using autoshutdown as the nvidia driver (or the pci-bus of the card?) is falling of if restarting from hibernate or suspend mode !

You can read more about that in the nvidia documentation:

Chapter 22. PCI-Express Runtime D3 (RTD3) Power Management

Chapter 21. Configuring Power Management Support

Chapter 29. Using the nvidia-persistenced Utility

First of all you need the dedicated nvidia scripts for power management and you have to find them in the nvidia driver install package:

Download the nvidia install package from:

Unix-Treiber | NVIDIA

or for the 460 driver the direct link:

Linux x64 (AMD64/EM64T) Display Driver | 460.39 | Linux 64-bit | NVIDIA

Lade den Deutsch Linux x64 (AMD64/EM64T) Display Driver für Linux 64-bit Systeme. Veröffentlicht 2021.1.26

www.nvidia.de

Or on the system:

Code

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/460.39/NVIDIA-Linux-x86_64-460.39.run

Now do not install the driver (!) - just extract it:

Code

sh NVIDIA-Linux-x86_64-460.39.run --extract-only

In the next step you have to search for the following files and copy them to the given directories (I used an ssh client for this)

Code

/etc/systemd/system/nvidia-suspend.service
/etc/systemd/system/nvidia-hibernate.service
/etc/systemd/system/nvidia-resume.service
/lib/systemd/system-sleep/nvidia
/usr/bin/nvidia-sleep.sh

then enable the services:

Code

sudo systemctl enable nvidia-suspend.service
sudo systemctl enable nvidia-hibernate.service
sudo systemctl enable nvidia-resume.service

change nvidia kernel config:

/etc/modprobe.d/nvidia-kernel-common.conf

Code

options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia NVreg_DynamicPowerManagementVideoMemoryThreshold=100
options nvidia NVreg_DynamicPowerManagement=0x02
options nvidia NVreg_EnableMSI=0

now make nvidia-sleep.sh executable and update modules:

Code

chmod a+x /usr/bin/nvidia-sleep.sh

sudo update-initramfs -u

Remark:

I still use the script files from the 450 driver version and they still work with 460. So I think they are more general and not driver specific. But if you experience problems

with hardware transcoding on a new driver version after a resume maybe you have to extract them fresh from the new driver package.

And I don't know if this final part is really needed, but if you still have issues with resume from suspend you can try to disable active state power management on pcie:

Change/add kernel parameter pcie_aspm=off in /etc/default/grub to:

Code

GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off"

and after changing grub:

Code

sudo update-grub

reboot

That was a long way

Please let me know if there are still issues with htis guide. I'll try to keep it up to date...

Good luck,

Chris

chris_kmn · 21. Mai 2022

******************************* OMV6 & OMV 7**************************************

If you are having issues with the new docker-compose-Plugin, read the comments regarding that issue.

Tested and running on:

- Linux Debian 11, Kernel 5.18.0-0.bpo.4-amd64, 5.18.0-0.deb11.4-amd64, 5.19.0-0.deb11.2-amd64, 5.19.17-1-pve, Linux 6.1.15-1-pve, Linux 6.2.16-11-bpo11-pve, 6.2.16-20-pve, 6.5.13-1-pve, 6.1.15-1-pve

- 6.9.1-1 (Shaitan), 7.0.1-1 (Sandworm)

- docker-ce 5:25.0.4-1~debian.12~bookworm

- docker-compose-plugin 2.24.7-1~debian.12~bookworm

(- Portainer 2.18.2)

- nvidia driver: 470.199.02, Cuda 11.4, 525.147.05-7~deb12u1 Cuda 12.0

- Nvidia Quadro P1000 & T600 graphics card

- Plex 1.40.1.8173

Not working: nvidia driver 550.54.14, Cuda 12.4

First of all and very important:

Transcoding must work on the intel hardware ! There are several tutorials for that. Mainly it is related to access rights like changing the "noexex" to "exec" option in the fstab and the correct setting of plex docker container

Second advise:

Check if your graphics card supports the codecs your videos are encoded. If your card is older, hw-transcoding will not work even if the setup is done properly.

Here is an overview provided by nvidia: Nvidia codec matrix

And now let‘s start:

Step 1:

if you already tried to install nvidia driver: purge it out !

Code

apt-get purge *nvidia*

apt autoremove

apt autoclean

You can also try this method if the above leads to errors:

Code

sudo apt-get remove --purge '^nvidia-.*'

apt autoremove

apt autoclean

Step 2:

prepare your header files:

Code

apt-get install module-assistant

sudo m-a prepare

Step 3:

Instructions from nvidia. In detail:

Add "contrib" and "non-free" components to /etc/apt/sources.list, for example:

Code

# Debian Bullseye
deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free

or Bookworm:

Code

# Debian Bookworm
deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
deb-src http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware

You should check that thare aren't any doubled entries ! But there might be additional other sources, depending on your setup.

Then:

Code

apt update

apt install nvidia-driver firmware-misc-nonfree

Check if there aren‘t any error messages.

If there are no errors, proceed:

Code

apt install nvidia-xconfig

sudo nvidia-xconfig

Since Docker 5.20.10.2 (I think) there was a change how docker gets access to hardware via cgroups. You need this workaround in the kernel boot parameters:

Do the following:

Code

echo 'GRUB_CMDLINE_LINUX=systemd.unified_cgroup_hierarchy=false' > /etc/default/grub.d/cgroup.cfg


update-grub

now reboot and the nvidia driver should already work.

Check with

Code

apt install nvidia-smi

nvidia-smi

it should show your nvidia card and driver.

Step 4: Install Nvidia container toolkit

Follow the Installation guide by Nvidia:

Installation Guide — NVIDIA Cloud Native Technologies documentation

in detail:

Install curl if you don't have it already:

Code

sudo apt install curl

Setup the stable repository and the GPG key - all in one command! (use copy button):

if this doesn't work please check for any changes under this link: nvidia container

Code

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Code

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the nvidia-docker2 package (and dependencies) after updating the package listing:

Code

sudo apt-get update

apt install nvidia-docker2

Now install Nvidia encode library and nvidia-smi:

Code

apt install libnvidia-encode1

Step 5: install Nvidia container runtime and configure it:

(the configuration tool seems to be new. I've never used it before)

Code

sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

*******************************************************************************

Important Comment if you are using the compose plugin:

With the new docker compose plugin, every time you reconfigure the OMV settings, that plugin is overwriting the daemon.json
from step 6.

!!! To prevent this you have to go to /#/services/compose/settings in the OMV GUI and leave the entry for "Docker Storage" blank !!!
(standard setting is /var/lib/docker)

If overwriting happened , repeat the commands:

Code

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

***********************************************************************************

Step 6: some modifications:

a.) Configuration for use without compose-Plugin:

Change/edit the Daemon configuration file /etc/docker/daemon.json :

Code

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia",
    "data-root": "/var/lib/docker"
}

b.:) Configuration for use with compose-Plugin:

Code

{
    "data-root": "/var/lib/docker",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

c.:) I wasn' able to figure out until now, if the two configs are needed or if one or the other works for both. May be it is even not necessary to manually edit the daemon.json anymore. I'll keep you updated -> it seems that both configs are working the same. I'll keep them for a while in the guid, delete one later.

and /etc/nvidia-container-runtime/config.toml to: (you can leave out the comments #...)

Code

disable-require = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true

[nvidia-container-cli]
environment = []
load-kmods = true
ldconfig = "/sbin/ldconfig.real"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"

Restart docker:

Code

sudo systemctl restart docker

Step 7a: starting Plex (in Portainer)

(Docker-image: linuxserver/plex:latest. Link: Docker)

in Portainer you have to add the following parameters in the "Env" tab:

name:

NVIDIA_DRIVER_CAPABILITIES

Value

compute,video,utility

name

NVIDIA_VISIBLE_DEVICES

Value

all

name

VERSION

Value

plexpass

and in the tab "Runtime & Ressources":

change the "Runtime" Value from runc to nvidia !!!

No Privileged mode and no Init set.

Check that you have set up properly the PUID and GUID values and changed the ownership of the plex config directory to that user ! (also needed for Intel HW Transcoding)

Step 7b: starting Plex with compose:

Go to the compose-plugin and create a new compose-file under the files-section:

Code

---
version: "2.1"
services:
  plex:
    image: lscr.io/linuxserver/plex
    container_name: plex
    network_mode: host
    environment:
      - PUID=1001   #as described in the plex documentation
      - PGID=100    #as described in the plex documentation
      - VERSION=plexpass   #if you are having a plex pass. Otherwise use "docker"
      - NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
      - NVIDIA_VISIBLE_DEVICES=all
      - UMASK=022
      - PLEX_MEDIA_SERVER_USER=plex
    runtime: nvidia  
    restart: unless-stopped
    volumes:      
      - /etc/localtime:/etc/localtime:ro
      - path to your config file:/config
      - path to your media file:/media
      - path to your temp file:/tmp        # e.g. for transcoding on a ssd

Alles anzeigen

If you are getting a runtime error when starting the container, check if docker is running with nvidia:

Code

sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

If it is not starting, repeat:

Code

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

Check that you have set up properly the PUID and GUID values and changed the ownership of the plex config directory to that user ! (also needed for Intel HW Transcoding)

Step 8: check the box

this is how it worked for me. If you want to check operation you can display GPU load with:

Code

watch -d -n 0.5 nvidia-smi

or install:

Code

apt install nvtop

and use:

Code

nvtop

You can also try/use the docker command line interface (cli) to start Plex:

Code

docker run \
-d \
--name=plex-gpu \
--net=host \
--gpus all \
-device /dev/dri:/dev/dri \
-e PUID=xxxxxx \                    #change to your value
-e PGID=yyyyyy \                    #change to your value
-e PLEX_MEDIA_SERVER_USER=plex \
-e VERSION="plexpass" \
-v your volumes \                   #map your volumes
-v your volumes \                   #map your volumes
--restart unless-stopped \
ghcr.io/linuxserver/plex:latest

Alles anzeigen

Step 9: get rid of the session limit:

If you want to disable the session limit (my P1000 had a limit of 3) go ahead with this link. It worked for me also:

GitHub - keylase/nvidia-patch: This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.

This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs. - GitHub -…

github.com

in detail:

if you don't have git, install it:

Code

apt install git

then

Code

git clone https://github.com/keylase/nvidia-patch.git nvidia-patch

Patch the nvidia driver:

Code

cd nvidia-patch
bash ./patch.sh

If you want to rollback:

Code

bash ./patch.sh -r

Please let me know if there are still issues with htis guide. I'll try to keep it up to date...

Good luck,

Chris

chris_kmn · 8. Juni 2023

Step 10: Nvidia Power Management:

This issue might be solved in the actual drivers. But if you have problems with a "fallen off graphics card" after a suspend or hibernate (no hw-transcoding after suspend/hibernate) you can try this.

You need these modifications if you are using autoshutdown as the nvidia driver (or the pci-bus of the card?) is falling of if restarting from hibernate or suspend mode !

You can read more about that in the nvidia documentation:

http://us.download.nvidia.com/…namicpowermanagement.html

http://us.download.nvidia.com/…ADME/powermanagement.html

http://us.download.nvidia.com/…/nvidia-persistenced.html

First of all you need the dedicated nvidia scripts for power management and you have to find them in the nvidia driver install package:

Download the nvidia install package from:

Unix-Treiber | NVIDIA

or for the 470 driver the direct link:

470.103.01

Or on the system:

Code

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/470.103.01/NVIDIA-Linux-x86_64-470.103.01.run

Now do not install the driver (!) - just extract it:

Code

sh NVIDIA-Linux-x86_64-470.103.01.run --extract-only

In the next step you have to search for the following files and copy them to the given directories (I used an ssh client for this)

Code

/etc/systemd/system/nvidia-suspend.service
/etc/systemd/system/nvidia-hibernate.service
/etc/systemd/system/nvidia-resume.service
/lib/systemd/system-sleep/nvidia
/usr/bin/nvidia-sleep.sh

then enable the services:

Code

sudo systemctl enable nvidia-suspend.service
sudo systemctl enable nvidia-hibernate.service
sudo systemctl enable nvidia-resume.service

change nvidia kernel config /etc/modprobe.d/nvidia-kernel-common.conf:

Code

options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia NVreg_DynamicPowerManagementVideoMemoryThreshold=100
options nvidia NVreg_DynamicPowerManagement=0x02
options nvidia NVreg_EnableMSI=0

now make nvidia-sleep.sh executable and update modules:

Code

chmod a+x /usr/bin/nvidia-sleep.sh

sudo update-initramfs -u

Remark:

I still use the script files from the 450 driver version and they still work with 460. So I think they are more general and not driver specific. But if you experience problems

with hardware transcoding on a new driver version after a resume maybe you have to extract them fresh from the new driver package.

And I don't know if this final part is really needed, but if you still have issues with resume from suspend you can try to disable active state power management on pcie:

Change/add kernel parameter pcie_aspm=off in /etc/default/grub to:

Code

GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off"

and after changing grub:

Code

sudo update-grub

reboot

For OMV 6 proceed to the second post: OMV 6

Step 1:

Step 2:

Step 3: installing Nvidia Driver

Step 4: Install Nvidia container toolkit

Step 5: install Nvidia container runtime:

Step 6: some modifications:

Change/edit the Daemon configuration file

Step 7: starting Plex (in Portainer)

in Portainer you have to add the following parameters in the "Env" tab:

and in the tab "Runtime & Ressources":

Step 8: try and error:

Step 9: get rid of the session limit:

Step 10: Nvidia Power Management:

geaves 13. Februar 2021

chris_kmn 13. Februar 2021

If you are having issues with the new docker-compose-Plugin, read the comments regarding that issue.

Step 1:

Step 2:

Step 3:

Step 4: Install Nvidia container toolkit

Step 5: install Nvidia container runtime and configure it:

Step 6: some modifications:

a.) Configuration for use without compose-Plugin:

Change/edit the Daemon configuration file /etc/docker/daemon.json :

Step 7a: starting Plex (in Portainer)

(Docker-image: linuxserver/plex:latest. Link: Docker)

in Portainer you have to add the following parameters in the "Env" tab:

and in the tab "Runtime & Ressources":

Step 7b: starting Plex with compose:

Step 8: check the box

Step 9: get rid of the session limit:

Step 10: Nvidia Power Management:

This issue might be solved in the actual drivers. But if you have problems with a "fallen off graphics card" after a suspend or hibernate (no hw-transcoding after suspend/hibernate) you can try this.

chris_kmn 11. Juni 2023

Jetzt mitmachen!

Ähnliche Themen

How to setup Nvidia in Plex docker for hardware transcoding?

Nvidia driver and docker installation

PLEX Hardware Transcoding (HW) Docker container NVIDIA OMV5

Tags