Mar 01, 2025
Podman Play Kube 5.4.0 Resource Requests
And as a bonus: How To Set Up Extra NVIDIA Libraries (libcuDNN, libcuBLAS) for your containers.
Introduction
Note: I am using Fedora 41 as of writing this article.
This weekend, I was trying to set up NVIDIA GPU support inside containers, using the new podman 5.4.0 version. I also encountered the need to give them proper access to extra NVIDIA CUDA libraries. nvidia-ctk
(Container Toolkit) simplifies this (in theory)
by providing the cdi generate
command, enabling seamless integration of NVIDIA GPUs with containerized applications. However it sometimes doesn’t include every installed NVIDIA library in the resulting CDI file (Which is in itself a sort of manifest, telling the system what to provide any container that asks for a “resource”).
I encountered the need for these libraries(libcuDNN, libcuBLAS) when running Faster Whisper in my Home Assistant pod using podman play kube
.
This post walks through setting up NVIDIA libraries inside containers using nvidia-ctk cdi generate
and some manual corrections to ensure all installed libraries you need are present inside your containers.
Ok… Lets get into it
Prerequisites
Before proceeding, ensure you have:
- An NVIDIA GPU with up-to-date drivers Nvidia Drivers
Just chmod the .run file, and execute it. Follow it’s instructions carefully - NVIDIA Container Toolkit (
nvidia-ctk
) Nvidia Installation Tutorial - Podman 5.4.0 installed I installed manually from here
- CUDA installed on the host system (if you want to use CUDA) Nvidia CUDA
nvidia-smi
functioning correctly- Required libraries installed (e.g., cuDNN, cuBLAS)
Step 1: Generating CDI Specification
The CDI (Container Device Interface) allows fine-grained control over GPU access inside containers. Use nvidia-ctk cdi generate
to generate a first version (It may be missing some things but you will add them yourself).
Step 1.1: Generating the Base CDI Specification
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml |
This creates the CDI configuration file (nvidia.yaml
) that maps NVIDIA devices into containers.
Step 1.2: Including Additional CUDA Libraries (libcudnn, libcublas)
By default, nvidia-ctk cdi generate
may not include libcudnn
and libcublas
, even if they are installed. You can manually generate .csv
files and merge them into the final CDI configuration (/etc/cdi/nvidia.yaml
).
Step 1.2.1: Generating .csv
Files
Locate the required libraries using find
or ls
and format them into .csv
files following this format. Note that this method might find too many files, and there could be a more efficient way to accomplish this. Also, I had an issue with my symlinks and in the end I added them manually into the final /etc/cdi/nvidia.yaml I used
find /usr/ -name "*cudnn*" -o -lname "*cudnn*" | awk '{print (system("test -L " $1) == 0 ? "sym," : "lib,") $1}' > libcudnn.csv |
Ensure the .csv
files follow the required format:
lib,/usr/lib/x86_64-linux-gnu/libcudnn.so.8 |
Step 1.2.2: Generating the Additional CDI Configuration
sudo nvidia-ctk cdi generate --mode=csv --csv-file libcudnn.csv --csv-file libcublas.csv --output=/tmp/nvidia_additional.yaml |
Step 1.2.3: Merging with the Existing CDI Configuration
Manually merge the content from /tmp/nvidia_additional.yaml
into /etc/cdi/nvidia.yaml
to ensure these libraries are available inside the container.
Step 2: Run the test Container
For Podman
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi |
Step 3: Verifying Libraries Inside the Container
To ensure that libcudnn
and libcublas
are properly loaded inside the container, use podman exec
to list the installed libraries and check for their presence:
podman exec <container_id> ls -lah /usr/lib/x86_64-linux-gnu/ | grep -E "cublas|cudnn" |
Replace <container_id>
with the actual running container ID. If the libraries are not found, verify that they were correctly added to /etc/cdi/nvidia.yaml
and ensure the CDI specification is properly applied.
Step 4: Running GPU-Enabled Pods in a play kube format
With the CDI configuration in place, running GPU-enabled containers is straightforward.
Using Podman
I wanted to use Podman 5.4.0, since the resources->requests
spec was implemented. However, the documentation is not very explicit regarding NVIDIA GPU usage. The correct format is:
resources: |
Initially, I omitted =0
, causing it not to match any of my nvidia-ctk cdi
-generated CDI configurations. I lost a lot of time on this, because the kubernetes format is not the same for nvidia GPUs as when you are providing them using a CDI:
Kubernetes format:
resources: |
When using podman play kube or podman run: NVIDIA CDI Support.
When using kubernetes: Nvidia/k8s-device-plugin
podman play kube manifest.yaml |
ex: manifest.yaml:
apiVersion: v1 |
Troubleshooting
- No GPU detected in the container: Ensure
nvidia-ctk cdi generate
created the correct CDI file and that your runtime is configured correctly. - Permission errors: Try running with
sudo
or checking container runtime permissions. - Container runtime does not recognize CDI devices: Verify that the CDI specification file is located in
/etc/cdi/
and referenced properly.
Conclusion
Using nvidia-ctk cdi generate
simplifies GPU access in containerized environments, making it easier to run CUDA applications efficiently. By following these steps, you can ensure seamless NVIDIA GPU integration, improving performance and portability.