GPU服务器,Docker启动出现could not select device driver ““ with capabilities: [[gpu]].
环境说明
组件 | 版本 |
---|---|
Tencent os | 3.1 |
Docker | 26.1.3 |
Kernel | 5.4.119-19.0009.56 |
问题处理流程
问题描述
启动GPU模型容器,出现下面错误
解决方案
前提条件
- 确保已安装NVIDIA GPU驱动(通过nvidia-smi命令验证)
- 已安装并启动Docker服务(通过docker ps命令验证)
安装步骤
- Ubuntu系统
# 配置腾讯云镜像源
curl -fsSL https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu20.04/nvidia-docker.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list# 替换为腾讯云源
sed -i 's/nvidia.github.io/mirrors.tencentyun.com/g' /etc/apt/sources.list.d/nvidia-container-toolkit.list# 安装工具包
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
- Centos系统
# 配置腾讯云源
curl -s -L https://nvidia.github.io/nvidia-docker/centos8/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sed -i 's/nvidia.github.io/mirrors.tencentyun.com/g' /etc/yum.repos.d/nvidia-container-toolkit.repo# 安装工具包
sudo yum install -y nvidia-container-toolkit
# 重启Docker
sudo systemctl restart docker
验证安装
sudo dpkg -l | grep nvidia-container-toolkit # Ubuntu
sudo rpm -qa | grep nvidia-container-toolkit # CentOS