Setting up Nvidia GPU computation on executor¶
Prerequisite: Docker version 19.0.3+
. Check Docker versions and nvidia for details.
Below steps are for ubuntu primarily for other distros check the associated links.
Install nvidia drivers on hosts¶
Ubuntu provides packaged drivers for nvidia. Driver installation Guide
Recommended
ubuntu-drivers list --gpgpu
ubuntu-drivers install --gpgpu nvidia:535-server
Alternatively apt
can be used, but may require additional steps Manual install
# Check for the latest stable version
apt search nvidia-driver.*server
apt install -y nvidia-driver-535-server nvidia-utils-535-server
For other distros check Guide
Install Nvidia-container-toolkit¶
Add nvidia repo
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt install -y nvidia-container-toolkit
Configure docker with nvidia toolkit
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker #Restart Docker
Verify installation¶
On Host
nvidia-smi -l
In docker container
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Enable nvidia support on drove¶
Enable Nvidia support in drove-executor.yml and restart drove-executor
...
resources:
...
enableNvidiaGpu: true
...