NVIDIA, CUDA, Vulkan

How to restrict CUDA devices to be used during DCV stream encoding?

Is very common to share GPU devices with a lot of services, but sometimes some applications use a good amount of resources or need to switch between Compute and Graphic mode, which can cause issues with DCV Server and other non-DCV services. This is expected due CUDA design and can be mitigated applying some rules to avoid two services causing conflicts.

Linux

Edit the /etc/dcv/dcv.conf file and under [display] section you can set:

cuda-devices=['0', '4']

This means that DCV can just use the GPUs 0 and 4. If you leave the variable empty, the DCV will try to get the GPU 0 and, if is not possible, it will try the next one (1), then (2) etc, and eventually return to 0, until is possible to pick an available device.

This configuration is similar of CUDA_VISIBLE_DEVICES environment CUDA var; The difference is that it will be just applied in the DCV service (dcv.conf).

How to block nouveau driver

The nvidia module can not be loaded if the nouveau module is loaded. You need to block the nouveau driver.

Create the file /etc/modprobe.d/blacklist-nouveau.conf with this content:

blacklist nouveau
options nouveau modeset=0

nvidia-persistenced is not running

If nvidia-persistenced is not running, you might encounter several issues with NVIDIA GPUs:

Inconsistent GPU Performance: The daemon helps maintain persistence mode, which keeps the GPU initialized even when not in use. Without it, applications may experience higher latency during GPU initialization.
Driver State Problems: The service maintains consistent driver state. Without it, the driver might need to reinitialize more frequently, causing performance fluctuations.
Memory Leaks: One of its primary functions is to clean up GPU memory when processes terminate abnormally. Without it, orphaned memory allocations can accumulate over time.
Multi-User Environment Issues: In systems with multiple users running GPU workloads, the absence of this service can lead to resource conflicts and unstable behavior.
CUDA Application Failures: Some CUDA applications explicitly rely on the persistence daemon, and may fail to launch or operate correctly without it.
Performance Degradation in AI/ML Workloads: Deep learning frameworks like TensorFlow and PyTorch may experience slower startup times and unstable performance.
Docker/Container Issues: GPU passthrough to containers often works more reliably with the persistence daemon running.
Higher Power Consumption: Without proper management, GPUs might not enter lower power states efficiently when idle.

For production environments running GPU workloads, especially in data centers or servers running AI applications, it’s generally recommended to ensure this service is properly configured and running.

Enabling the service:

# Enable the service to start at boot
sudo systemctl enable nvidia-persistenced

# Start the service immediately
sudo systemctl start nvidia-persistenced

# Verify it's running
sudo systemctl status nvidia-persistenced

Checking the logs:

sudo journalctl -u nvidia-persistenced

Creating xorg.conf file with nvidia-xconfig

Here you can follow the best steps to create a xorg.conf file using nvidia-xconfig.

You have two possible scenarios:

Using pysical display devices
Using only virtual screens (without external monitors attached)

Virtual screens only

# 1. Remove any existing NVIDIA drivers completely
sudo nvidia-uninstall

# 2. Install the NVIDIA driver in headless mode
sudo sh NVIDIA-Linux-x86_64-XXX.XX.run --no-opengl-files

# 3. Generate xorg.conf with virtual screens only
sudo nvidia-xconfig --preserve-busid --enable-all-gpus --virtual=1920x1080 --use-display-device=none --allow-empty-initial-configuration

# 4. Restart X server
sudo systemctl isolate multi-user.target
sleep 3
sudo systemctl isolate graphical.target

# 5. Enable NVIDIA DCV for 3D acceleration
sudo dcvgladmin enable

# 6. Verify the installation
dcvgldiag

# 7. Check virtual displays
DISPLAY=:0 xrandr

Physical displays connected

# 1. Remove any existing NVIDIA drivers completely
sudo nvidia-uninstall

# 2. Install the NVIDIA driver (use latest stable version if possible)
sudo sh NVIDIA-Linux-x86_64-XXX.XX.run

# 3. Generate xorg.conf with physical display support
sudo nvidia-xconfig --preserve-busid --enable-all-gpus --connected-monitor=DFP --allow-empty-initial-configuration

# 4. Restart X server
sudo systemctl isolate multi-user.target
sudo systemctl isolate graphical.target

# 5. Enable NVIDIA DCV for 3D acceleration
sudo dcvgladmin enable

# 6. Verify the installation
dcvgldiag

Vulkan loader ‘libvulkan.so.1’ not found

The dcvserver.service status can report the following warning when a DCV client is connected to a console session of a Linux server with an NVIDIA GPU on board:

WARNING: Could not find the Vulkan loader 'libvulkan.so.1' on this system.
Attempting to load the NVIDIA Vulkan ICD...
(This warning is non-fatal and can be suppressed with the NVFBC_NO_WARNING=1 environment variable.)

The solution is setup the libvulkan1, vulkan or vulkan-loader packages, depending in which Linux distribution you are. Check your repositories for “vulkan” packages and install the one that will configure the Vulkan libraries.

How to check which NVIDIA encoders my GPU card support?

You can check through this NVIDIA encoder/decoders page clicking here or with the command:

nvidia-smi -a

If the Encoder line is “N/A”, then you need to check in the NVIDIA site.

Note: The actual support of H264 in-hardware encoding also depends on NVIDIA Driver version.

High virtual memory usage on Linux when CUDA is used

The Virtual Memory Issue with CUDA on Linux

CUDA-enabled programs on Linux appear to consume large amounts of virtual memory, approximately equal to the combined size of the GPU’s physical memory plus the system memory. This is a known behavior that can be concerning to users monitoring memory usage.

Overview

This behavior is directly related to CUDA’s Unified Virtual Addressing (UVA) implementation. To provide a unified address space, CUDA needs to:

Reserve virtual memory space equivalent to the total physical GPU memory
Plus the total system memory
Plus a small additional amount for alignment purposes

In order to provide a unified address space, all physical memory (host system and GPUs) must be mapped into a single virtual space. As a result, CUDA’s virtual memory usage will look huge.

Key Points to Understand

This high virtual memory usage doesn’t typically cause performance issues.
It’s not consuming actual physical memory – just address space.
The behavior is by design and part of CUDA’s architecture on Linux.

Potential Workarounds

According to Robert Crovella, if using a multi-GPU system but only needing some GPUs, you can use the CUDA_VISIBLE_DEVICES environment variable to reduce the GPU “footprint” and thus the virtual memory allocation.

Also, a better estimation can be obtained using this command:

ps -o rss $(pidof dcvagent)

However, there appears to be no direct controls over this virtual memory allocation, and it’s not formally documented in NVIDIA’s official materials. The large virtual memory usage is inherent to how CUDA implements unified memory addressing on Linux. This is normal behavior, though it can be misinterpreted by users not familiar with virtual memory concepts.

Valid GRID license not found

If you are seeing the message below in your journal log:

nvidia-gridd[2345]: Valid GRID license not found. GPU features and performance are restricted. To enable full functionality please configure licensing details.

You need to setup your license to release the resources restrictions that are being applied into your GPU card.

More details you can check here.

NVIDIA license not working even with NVIDIA VGPU license server running

You need to check your current NVIDIA VGPU Server version and your driver version. Is common that newest drivers are not supported by old versions of the License Server. To get the last version follow the link clicking here.

Prime render offload / “no matching fbconfig”

PRIME render offload is the ability to have an X screen rendered by one GPU, but choose certain applications within that X screen to be rendered on a different GPU. More info you can review clicking here.

To configure a graphics application to be offloaded to the NVIDIA GPU screen, set the environment variable __NV_PRIME_RENDER_OFFLOAD to 1. If the graphics application uses Vulkan, that should be all that is needed. If the graphics application uses GLX, then also set the environment variable __GLX_VENDOR_LIBRARY_NAME to nvidia, so that GLVND loads the NVIDIA GLX driver. NVIDIA’s EGL implementation does not yet support PRIME render offload.

For DCV, in general the recommended setting is __GLX_VENDOR_LIBRARY_NAME set to dcv, to use dcv-gl resources, specially if you are sharing your GPU. You can check an example clicking here.

Sometimes after start certain applications, they change the environment values in a wrong way, making the GPU not available for all other applications started after that. If you are seeing your application just using CPU, and you need to explicit add some parameter to look for GPU or you are seeing messages like “no matching fbconfig”, you need to open a support ticket with your application and ask them to fix the issue.

While you do not have that fixed, to guarantee that the GPU will be avalable for your applications, you can do

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=dcv glxinfo

to start your application correctly or guarantee that the variables are correctly exported for other services:

export __NV_PRIME_RENDER_OFFLOAD=1
export __GLX_VENDOR_LIBRARY_NAME=dcv

With __GLX_VENDOR_LIBRARY_NAME=nvidia, the application will not load dcv-gl and use the NVIDIA GLX implementation directly. It can works better or worse, and will depends of your hardware, driver version, application, single or multiple GPU etc. We recommended to have __GLX_VENDOR_LIBRARY_NAME set to dcv and just change if you know what you are doing.