Podman Play Kube 5.4.0 Resource Requests

And as a bonus: How To Set Up Extra NVIDIA Libraries (libcuDNN, libcuBLAS) for your containers.

Introduction

^{Note: I am using Fedora 41 as of writing this article.}
This weekend, I was trying to set up NVIDIA GPU support inside containers, using the new podman 5.4.0 version. I also encountered the need to give them proper access to extra NVIDIA CUDA libraries. nvidia-ctk (Container Toolkit) simplifies this (in theory) by providing the cdi generate command, enabling seamless integration of NVIDIA GPUs with containerized applications. However it sometimes doesn’t include every installed NVIDIA library in the resulting CDI file (Which is in itself a sort of manifest, telling the system what to provide any container that asks for a “resource”).

I encountered the need for these libraries(libcuDNN, libcuBLAS) when running Faster Whisper in my Home Assistant pod using podman play kube.
This post walks through setting up NVIDIA libraries inside containers using nvidia-ctk cdi generate and some manual corrections to ensure all installed libraries you need are present inside your containers.
Ok… Lets get into it

Prerequisites

Before proceeding, ensure you have:

An NVIDIA GPU with up-to-date drivers Nvidia Drivers
_{Just chmod the .run file, and execute it. Follow it’s instructions carefully}
NVIDIA Container Toolkit (nvidia-ctk) Nvidia Installation Tutorial
Podman 5.4.0 installed I installed manually from here
CUDA installed on the host system (if you want to use CUDA) Nvidia CUDA
nvidia-smi functioning correctly
Required libraries installed (e.g., cuDNN, cuBLAS)

Step 1: Generating CDI Specification

The CDI (Container Device Interface) allows fine-grained control over GPU access inside containers. Use nvidia-ctk cdi generate to generate a first version (It may be missing some things but you will add them yourself).

Step 1.1: Generating the Base CDI Specification

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

This creates the CDI configuration file (nvidia.yaml) that maps NVIDIA devices into containers.

Step 1.2: Including Additional CUDA Libraries (libcudnn, libcublas)

By default, nvidia-ctk cdi generate may not include libcudnn and libcublas, even if they are installed. You can manually generate .csv files and merge them into the final CDI configuration (/etc/cdi/nvidia.yaml).

Step 1.2.1: Generating `.csv` Files

Locate the required libraries using find or ls and format them into .csv files following this format. Note that this method might find too many files, and there could be a more efficient way to accomplish this. Also, I had an issue with my symlinks and in the end I added them manually into the final /etc/cdi/nvidia.yaml I used

find /usr/ -name "*cudnn*" -o -lname "*cudnn*" | awk '{print (system("test -L " $1) == 0 ? "sym," : "lib,") $1}' > libcudnn.csv
find /usr/ -name "*cublas*" -o -lname "*cublas*" | awk '{print (system("test -L " $1) == 0 ? "sym," : "lib,") $1}' > libcublas.csv

Ensure the .csv files follow the required format:

lib,/usr/lib/x86_64-linux-gnu/libcudnn.so.8
sym,/usr/lib/x86_64-linux-gnu/libcudnn.so
lib,/usr/lib/x86_64-linux-gnu/libcublas.so.11
sym,/usr/lib/x86_64-linux-gnu/libcublas.so
etc...

Step 1.2.2: Generating the Additional CDI Configuration

sudo nvidia-ctk cdi generate --mode=csv --csv-file libcudnn.csv --csv-file libcublas.csv --output=/tmp/nvidia_additional.yaml

Step 1.2.3: Merging with the Existing CDI Configuration

Manually merge the content from /tmp/nvidia_additional.yaml into /etc/cdi/nvidia.yaml to ensure these libraries are available inside the container.

Step 2: Run the test Container

For Podman

podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi

Step 3: Verifying Libraries Inside the Container

To ensure that libcudnn and libcublas are properly loaded inside the container, use podman exec to list the installed libraries and check for their presence:

podman exec <container_id> ls -lah /usr/lib/x86_64-linux-gnu/ | grep -E "cublas|cudnn"

Replace <container_id> with the actual running container ID. If the libraries are not found, verify that they were correctly added to /etc/cdi/nvidia.yaml and ensure the CDI specification is properly applied.

Step 4: Running GPU-Enabled Pods in a play kube format

With the CDI configuration in place, running GPU-enabled containers is straightforward.

Using Podman

I wanted to use Podman 5.4.0, since the resources->requests spec was implemented. However, the documentation is not very explicit regarding NVIDIA GPU usage. The correct format is:

resources:
  requests:
    nvidia.com/gpu=0: 1

Initially, I omitted =0, causing it not to match any of my nvidia-ctk cdi-generated CDI configurations. I lost a lot of time on this, because the kubernetes format is not the same for nvidia GPUs as when you are providing them using a CDI:
Kubernetes format:

resources:
  requests:
    nvidia.com/gpu: 1 # requesting 1 GPU

When using podman play kube or podman run: NVIDIA CDI Support.
When using kubernetes: Nvidia/k8s-device-plugin

podman play kube manifest.yaml

ex: manifest.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-test
spec:
  containers:
  - name: nvidia-container
    image: ubuntu
    command: ["nvidia-smi"]
    resources:
      requests:
        nvidia.com/gpu=0: 1
      limits:
        nvidia.com/gpu=0: 1

Troubleshooting

No GPU detected in the container: Ensure nvidia-ctk cdi generate created the correct CDI file and that your runtime is configured correctly.
Permission errors: Try running with sudo or checking container runtime permissions.
Container runtime does not recognize CDI devices: Verify that the CDI specification file is located in /etc/cdi/ and referenced properly.

Conclusion

Using nvidia-ctk cdi generate simplifies GPU access in containerized environments, making it easier to run CUDA applications efficiently. By following these steps, you can ensure seamless NVIDIA GPU integration, improving performance and portability.

Mar 01, 2025