Ubuntu | NVIDIA 驱动、CUDA 与 cuDNN 的安装与配置 / 常见问题及解决方法
注:本文为“Ubuntu | NVIDIA 驱动”相关文章合辑。
图片清晰度受引文原图所限。
略作重排,未整理去重。
如有内容异常,请看原文。
在 Ubuntu24.04 上使用英伟达显卡 —— 安装驱动和 CUDA
universe_king 2024-08-21 浙江
方式一:使用 Ubuntu 自身的 ubuntu-drivers 工具
优点:超级无敌简单,不需要额外下载任何东西
缺点:驱动版本很老
.... pon@M60GPU.',:clooo: .:looooo:. ----------.;looooooooc .oooooooooo' OS: Ubuntu noble 24.04 x86_64
.;looooool:,''. :ooooooooooc Host: SYS-1028GR-TRT (123456789);looool;. 'oooooooooo, Kernel: Linux 6.8.0-38-generic;clool' .cooooooc. ,, Uptime: 26 days, 7 hours, 59 mins......... .:oo, Packages: 1007 (dpkg).;clol:,..loooo' Shell: zsh 5.9:ooooooooo,'ooool Terminal: /dev/pts/12
'ooooooooooo.loooo. CPU: Intel (R) Xeon (R) E5-2690 v4 (56) @ 3.50 GHz
'ooooooooool coooo. GPU 1: NVIDIA Tesla M60 [Discrete],loooooooc..loooo. GPU 2: NVIDIA Tesla M60 [Discrete].,;;;'. ;ooooc GPU 3: ASPEED Technology, Inc. ASPEED Graphics Family... ,ooool. Memory: 4.16 GiB / 31.26 GiB (13%).cooooc. ..',,'. .cooo. Swap: 256.00 KiB / 16.00 GiB (0%);ooooo:. ;oooooooc. :l. Disk (/): 27.05 GiB / 816.34 GiB (3%) - ext4.coooooc,.. coooooooooo. Disk (/mnt/data): 3.15 TiB / 4.51 TiB (70%) - ext4.:ooooooolc:. .ooooooooooo' Local IP (ens1f0): 192.168.38.233/24.':loooooo; ,oooooooooc Locale: en_US.UTF-8..';::c' .;loooo:'
首先使用 ubuntu-drivers devices
命令查看有哪些驱动可以安装
╰─➤ ubuntu-drivers devicesudevadm hwdb is deprecated. Use systemd-hwdb instead.
udevadm hwdb is deprecated. Use systemd-hwdb instead.
udevadm hwdb is deprecated. Use systemd-hwdb instead.
udevadm hwdb is deprecated. Use systemd-hwdb instead.
ERROR:root:aplay command not found
== /sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/0000:82:08.0/0000:83:00.0 ==
modalias : pci:v000010DEd000013F2sv000010DEsd0000115Ebc03sc02i00
vendor : NVIDIA Corporation
model : GM204GL [Tesla M60]
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-535 - distro non-free recommended
driver : nvidia-driver-535-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
选一个最新的 nvidia-driver-535-server
─➤ sudo apt install nvidia-driver-535-server 1 ↵
[sudo] password for pon:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:python3-cliapp python3-markdown python3-ttystatus python3-zombie-imp
Use'sudo apt autoremove' to remove them.
The following additional packages will be installed:dkms libegl-mesa0 libegl1 libepoxy0 libgbm1 libnvidia-cfg1-535-server libnvidia-common-535-server libnvidia-compute-535-server libnvidia-decode-535-server libnvidia-encode-535-serverlibnvidia-extra-535-server libnvidia-fbc1-535-server libnvidia-gl-535-server libwayland-server0 libxaw7 libxcvt0 libxfont2 libxkbfile1 libxmu6 libxrandr2 nvidia-compute-utils-535-servernvidia-dkms-535-server nvidia-firmware-535-server-535.183.01 nvidia-kernel-common-535-server nvidia-kernel-source-535-server nvidia-utils-535-server x11-xkb-utils xcvtxfonts-base xserver-common xserver-xorg-core xserver-xorg-video-nvidia-535-server
Suggested packages:menu nvidia-settings nvidia-prime xfs | xserver xfonts-100dpi | xfonts-75dpi xfonts-scalable
Recommended packages:libnvidia-compute-535-server:i386 libnvidia-decode-535-server:i386 libnvidia-encode-535-server:i386 libnvidia-fbc1-535-server:i386 libnvidia-gl-535-server:i386
The following NEW packages will be installed:dkms libegl-mesa0 libegl1 libepoxy0 libgbm1 libnvidia-cfg1-535-server libnvidia-common-535-server libnvidia-compute-535-server libnvidia-decode-535-server libnvidia-encode-535-serverlibnvidia-extra-535-server libnvidia-fbc1-535-server libnvidia-gl-535-server libwayland-server0 libxaw7 libxcvt0 libxfont2 libxkbfile1 libxmu6 libxrandr2 nvidia-compute-utils-535-servernvidia-dkms-535-server nvidia-driver-535-server nvidia-firmware-535-server-535.183.01 nvidia-kernel-common-535-server nvidia-kernel-source-535-server nvidia-utils-535-server x11-xkb-utils xcvtxfonts-base xserver-common xserver-xorg-core xserver-xorg-video-nvidia-535-server
0 upgraded, 33 newly installed, 0 to remove and 5 not upgraded.
Need to get 333 MB of archives.
After this operation, 820 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 dkms all 3.0.11-1ubuntu13 [51.5 kB]
Get:2 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libwayland-server0 amd64 1.22.0-2.1build1 [33.9 kB]
Get:3 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/main amd64 libgbm1 amd64 24.0.9-0ubuntu0.1 [42.7 kB]
Get:4 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/main amd64 libegl-mesa0 amd64 24.0.9-0ubuntu0.1 [115 kB]
Get:5 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libepoxy0 amd64 1.5.10-1build1 [220 kB]
Get:6 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-cfg1-535-server amd64 535.183.01-0ubuntu0.24.04.1 [107 kB]
Get:7 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-common-535-server all 535.183.01-0ubuntu0.24.04.1 [15.4 kB]
Get:8 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-compute-535-server amd64 535.183.01-0ubuntu0.24.04.1 [40.3 MB]
Get:9 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-decode-535-server amd64 535.183.01-0ubuntu0.24.04.1 [1,884 kB]
Get:10 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-encode-535-server amd64 535.183.01-0ubuntu0.24.04.1 [97.1 kB]
Get:11 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-extra-535-server amd64 535.183.01-0ubuntu0.24.04.1 [71.1 kB]
Get:12 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-fbc1-535-server amd64 535.183.01-0ubuntu0.24.04.1 [55.7 kB]
Get:13 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libegl1 amd64 1.7.0-1build1 [28.7 kB]
Get:14 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 libnvidia-gl-535-server amd64 535.183.01-0ubuntu0.24.04.1 [195 MB]
Get:15 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libxmu6 amd64 2:1.1.3-3build2 [47.6 kB]
Get:16 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libxaw7 amd64 2:1.0.14-1build2 [187 kB]
Get:17 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libxcvt0 amd64 0.1.2-1build1 [5,684 B]
Get:18 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libxfont2 amd64 1:2.0.6-1build1 [93.0 kB]
Get:19 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libxkbfile1 amd64 1:1.1.0-1build4 [70.0 kB]
Get:20 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 libxrandr2 amd64 2:1.5.2-2build1 [19.7 kB]
Get:21 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-compute-utils-535-server amd64 535.183.01-0ubuntu0.24.04.1 [122 kB]
Get:22 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-kernel-source-535-server amd64 535.183.01-0ubuntu0.24.04.1 [45.1 MB]
Get:23 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-firmware-535-server-535.183.01 amd64 535.183.01-0ubuntu0.24.04.1 [39.6 MB]
Get:24 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-kernel-common-535-server amd64 535.183.01-0ubuntu0.24.04.1 [226 kB]
Get:25 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-dkms-535-server amd64 535.183.01-0ubuntu0.24.04.1 [51.5 kB]
Get:26 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-utils-535-server amd64 535.183.01-0ubuntu0.24.04.1 [405 kB]
Get:27 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 x11-xkb-utils amd64 7.7+8build2 [170 kB]
Get:28 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 xserver-common all 2:21.1.12-1ubuntu1 [33.3 kB]
Get:29 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 xserver-xorg-core amd64 2:21.1.12-1ubuntu1 [1,474 kB]
Get:30 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 xserver-xorg-video-nvidia-535-server amd64 535.183.01-0ubuntu0.24.04.1 [1,586 kB]
Get:31 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble-updates/restricted amd64 nvidia-driver-535-server amd64 535.183.01-0ubuntu0.24.04.1 [482 kB]
Get:32 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 xcvt amd64 0.1.2-1build1 [6,982 B]
Get:33 <https://mirrors.tuna.tsinghua.edu.cn/ubuntu> noble/main amd64 xfonts-base all 1:1.0.5+nmu1 [5,941 kB]
Fetched 333 MB in 2min 42s (2,056 kB/s)
Extracting templates from packages: 100%
Selecting previously unselected package dkms.
(Reading database ... 147083 files and directories currently installed.)
Preparing to unpack .../00-dkms_3.0.11-1ubuntu13_all.deb ...
Unpacking dkms (3.0.11-1ubuntu13) ...
Selecting previously unselected package libwayland-server0:amd64.
Preparing to unpack .../01-libwayland-server0_1.22.0-2.1build1_amd64.deb ...
Unpacking libwayland-server0:amd64 (1.22.0-2.1build1) ...
Selecting previously unselected package libgbm1:amd64.
Preparing to unpack .../02-libgbm1_24.0.9-0ubuntu0.1_amd64.deb ...
Unpacking libgbm1:amd64 (24.0.9-0ubuntu0.1) ...
Selecting previously unselected package libegl-mesa0:amd64.
Preparing to unpack .../03-libegl-mesa0_24.0.9-0ubuntu0.1_amd64.deb ...
Unpacking libegl-mesa0:amd64 (24.0.9-0ubuntu0.1) ...
Selecting previously unselected package libepoxy0:amd64.
Preparing to unpack .../04-libepoxy0_1.5.10-1build1_amd64.deb ...
Unpacking libepoxy0:amd64 (1.5.10-1build1) ...
Selecting previously unselected package libnvidia-cfg1-535-server:amd64.
Preparing to unpack .../05-libnvidia-cfg1-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-cfg1-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libnvidia-common-535-server.
Preparing to unpack .../06-libnvidia-common-535-server_535.183.01-0ubuntu0.24.04.1_all.deb ...
Unpacking libnvidia-common-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libnvidia-compute-535-server:amd64.
Preparing to unpack .../07-libnvidia-compute-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-compute-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libnvidia-decode-535-server:amd64.
Preparing to unpack .../08-libnvidia-decode-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-decode-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libnvidia-encode-535-server:amd64.
Preparing to unpack .../09-libnvidia-encode-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-encode-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libnvidia-extra-535-server:amd64.
Preparing to unpack .../10-libnvidia-extra-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-extra-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libnvidia-fbc1-535-server:amd64.
Preparing to unpack .../11-libnvidia-fbc1-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-fbc1-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libegl1:amd64.
Preparing to unpack .../12-libegl1_1.7.0-1build1_amd64.deb ...
Unpacking libegl1:amd64 (1.7.0-1build1) ...
Selecting previously unselected package libnvidia-gl-535-server:amd64.
Preparing to unpack .../13-libnvidia-gl-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking libnvidia-gl-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package libxmu6:amd64.
Preparing to unpack .../14-libxmu6_2%3a1.1.3-3build2_amd64.deb ...
Unpacking libxmu6:amd64 (2:1.1.3-3build2) ...
Selecting previously unselected package libxaw7:amd64.
Preparing to unpack .../15-libxaw7_2%3a1.0.14-1build2_amd64.deb ...
Unpacking libxaw7:amd64 (2:1.0.14-1build2) ...
Selecting previously unselected package libxcvt0:amd64.
Preparing to unpack .../16-libxcvt0_0.1.2-1build1_amd64.deb ...
Unpacking libxcvt0:amd64 (0.1.2-1build1) ...
Selecting previously unselected package libxfont2:amd64.
Preparing to unpack .../17-libxfont2_1%3a2.0.6-1build1_amd64.deb ...
Unpacking libxfont2:amd64 (1:2.0.6-1build1) ...
Selecting previously unselected package libxkbfile1:amd64.
Preparing to unpack .../18-libxkbfile1_1%3a1.1.0-1build4_amd64.deb ...
Unpacking libxkbfile1:amd64 (1:1.1.0-1build4) ...
Selecting previously unselected package libxrandr2:amd64.
Preparing to unpack .../19-libxrandr2_2%3a1.5.2-2build1_amd64.deb ...
Unpacking libxrandr2:amd64 (2:1.5.2-2build1) ...
Selecting previously unselected package nvidia-compute-utils-535-server.
Preparing to unpack .../20-nvidia-compute-utils-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-compute-utils-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package nvidia-kernel-source-535-server.
Preparing to unpack .../21-nvidia-kernel-source-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-kernel-source-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package nvidia-firmware-535-server-535.183.01.
Preparing to unpack .../22-nvidia-firmware-535-server-535.183.01_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-firmware-535-server-535.183.01 (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package nvidia-kernel-common-535-server.
Preparing to unpack .../23-nvidia-kernel-common-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-kernel-common-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package nvidia-dkms-535-server.
Preparing to unpack .../24-nvidia-dkms-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-dkms-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package nvidia-utils-535-server.
Preparing to unpack .../25-nvidia-utils-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-utils-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package x11-xkb-utils.
Preparing to unpack .../26-x11-xkb-utils_7.7+8build2_amd64.deb ...
Unpacking x11-xkb-utils (7.7+8build2) ...
Selecting previously unselected package xserver-common.
Preparing to unpack .../27-xserver-common_2%3a21.1.12-1ubuntu1_all.deb ...
Unpacking xserver-common (2:21.1.12-1ubuntu1) ...
Selecting previously unselected package xserver-xorg-core.
Preparing to unpack .../28-xserver-xorg-core_2%3a21.1.12-1ubuntu1_amd64.deb ...
Unpacking xserver-xorg-core (2:21.1.12-1ubuntu1) ...
Selecting previously unselected package xserver-xorg-video-nvidia-535-server.
Preparing to unpack .../29-xserver-xorg-video-nvidia-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking xserver-xorg-video-nvidia-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package nvidia-driver-535-server.
Preparing to unpack .../30-nvidia-driver-535-server_535.183.01-0ubuntu0.24.04.1_amd64.deb ...
Unpacking nvidia-driver-535-server (535.183.01-0ubuntu0.24.04.1) ...
Selecting previously unselected package xcvt.
Preparing to unpack .../31-xcvt_0.1.2-1build1_amd64.deb ...
Unpacking xcvt (0.1.2-1build1) ...
Selecting previously unselected package xfonts-base.
Preparing to unpack .../32-xfonts-base_1%3a1.0.5+nmu1_all.deb ...
Unpacking xfonts-base (1:1.0.5+nmu1) ...
Setting up libnvidia-fbc1-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up libnvidia-common-535-server (535.183.01-0ubuntu0.24.04.1) ...
Setting up libwayland-server0:amd64 (1.22.0-2.1build1) ...
Setting up libxmu6:amd64 (2:1.1.3-3build2) ...
Setting up nvidia-kernel-source-535-server (535.183.01-0ubuntu0.24.04.1) ...
Setting up libgbm1:amd64 (24.0.9-0ubuntu0.1) ...
Setting up libxaw7:amd64 (2:1.0.14-1build2) ...
Setting up dkms (3.0.11-1ubuntu13) ...
Setting up xfonts-base (1:1.0.5+nmu1) ...
Setting up libegl-mesa0:amd64 (24.0.9-0ubuntu0.1) ...
Setting up libepoxy0:amd64 (1.5.10-1build1) ...
Setting up libnvidia-extra-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up libxrandr2:amd64 (2:1.5.2-2build1) ...
Setting up libnvidia-cfg1-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up libnvidia-compute-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up libegl1:amd64 (1.7.0-1build1) ...
Setting up libxcvt0:amd64 (0.1.2-1build1) ...
Setting up nvidia-firmware-535-server-535.183.01 (535.183.01-0ubuntu0.24.04.1) ...
Setting up libxkbfile1:amd64 (1:1.1.0-1build4) ...
Setting up libxfont2:amd64 (1:2.0.6-1build1) ...
Setting up nvidia-compute-utils-535-server (535.183.01-0ubuntu0.24.04.1) ...
info: The home dir /nonexistent you specified can't be accessed: No such file or directoryinfo: Selecting UID from range 100 to 999 ...info: Selecting GID from range 100 to 999 ...
info: Adding system user `nvidia-persistenced' (UID 110) ...
info: Adding new group `nvidia-persistenced' (GID 110) ...
info: Adding new user `nvidia-persistenced' (UID 110) with group`nvidia-persistenced' ...
info: Not creating `/nonexistent'.
Setting up x11-xkb-utils (7.7+8build2) ...
Setting up nvidia-utils-535-server (535.183.01-0ubuntu0.24.04.1) ...
Setting up xcvt (0.1.2-1build1) ...
Setting up libnvidia-gl-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up libnvidia-decode-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up nvidia-kernel-common-535-server (535.183.01-0ubuntu0.24.04.1) ...
update-initramfs: deferring update (trigger activated)
update-initramfs: Generating /boot/initrd.img-6.8.0-38-generic
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/ubuntu--vg-lv--0)
I: Set the RESUME variable to override this.
Created symlink /etc/systemd/system/systemd-hibernate.service.wants/nvidia-hibernate.service → /usr/lib/systemd/system/nvidia-hibernate.service.
Created symlink /etc/systemd/system/systemd-suspend.service.wants/nvidia-resume.service → /usr/lib/systemd/system/nvidia-resume.service.
Created symlink /etc/systemd/system/systemd-hibernate.service.wants/nvidia-resume.service → /usr/lib/systemd/system/nvidia-resume.service.
Created symlink /etc/systemd/system/systemd-suspend.service.wants/nvidia-suspend.service → /usr/lib/systemd/system/nvidia-suspend.service.
Setting up xserver-common (2:21.1.12-1ubuntu1) ...
Setting up libnvidia-encode-535-server:amd64 (535.183.01-0ubuntu0.24.04.1) ...
Setting up nvidia-dkms-535-server (535.183.01-0ubuntu0.24.04.1) ...
update-initramfs: deferring update (trigger activated)
update-initramfs: Generating /boot/initrd.img-6.8.0-38-generic
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/ubuntu--vg-lv--0)
I: Set the RESUME variable to override this.
Progress: [ 93%] [###################################################################################nvidia.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-38-generic/updates/dkms/nvidia-modeset.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-38-generic/updates/dkms/nvidia-drm.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-38-generic/updates/dkms/nvidia-uvm.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-38-generic/updates/dkms/nvidia-peermem.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-38-generic/updates/dkms/
depmod.....
Building initial module for 6.8.0-40-generic
Done.nvidia.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-40-generic/updates/dkms/nvidia-modeset.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-40-generic/updates/dkms/nvidia-drm.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-40-generic/updates/dkms/nvidia-uvm.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-40-generic/updates/dkms/nvidia-peermem.ko.zst:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/6.8.0-40-generic/updates/dkms/
depmod.....
Setting up xserver-xorg-core (2:21.1.12-1ubuntu1) ...
Setting up xserver-xorg-video-nvidia-535-server (535.183.01-0ubuntu0.24.04.1) ...
Setting up nvidia-driver-535-server (535.183.01-0ubuntu0.24.04.1) ...
Processing triggers for fontconfig (2.15.0-1.1ubuntu2) ...
Processing triggers for initramfs-tools (0.142ubuntu25.1) ...
update-initramfs: Generating /boot/initrd.img-6.8.0-40-generic
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/ubuntu--vg-lv--0)
I: Set the RESUME variable to override this.
Processing triggers for libc-bin (2.39-0ubuntu8.2) ...
Processing triggers for man-db (2.12.0-4build2) ...
Scanning processes...
Scanning candidates...
Scanning processor microcode...
Scanning linux images...Pending kernel upgrade!
Running kernel version:6.8.0-38-generic
Diagnostics:The currently running kernel version is not the expected kernel version 6.8.0-40-generic.Restarting the system to load the new kernel will not be handled automatically, so you should
consider rebooting.The processor microcode seems to be up-to-date.Restarting services...Service restarts being deferred:/etc/needrestart/restart.d/dbus.servicesystemctl restart systemd-logind.servicesystemctl restart unattended-upgrades.serviceNo containers need to be restarted.No user sessions are running outdated binaries.No VM guests are running outdated hypervisor (qemu) binaries on this host.
╰─➤ nvidia-smi 130 ↵
Thu Aug 22 02:00:34 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-IdDisp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla M60 Off | 00000000:83:00.0 Off | 0 |
| N/A 27C P0 40W / 150W | 0MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 00000000:84:00.0 Off | 0 |
| N/A 39C P0 40W / 150W | 0MiB / 7680MiB | 100% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------++---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CIPID Type Process name GPU Memory |
|ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
方式二:在英伟达官网自己下载驱动和 CUDA 并安装
优点:驱动版本可以是最新的
缺点:麻烦一点
下载驱动并安装
下载地址:https://www.nvidia.com/en-us/drivers/
─➤ sudo apt install ./nvidia-driver-local-repo-ubuntu2404-550.90.07_1.0-1_amd64.deb
[sudo] password for pon:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'nvidia-driver-local-repo-ubuntu2404-550.90.07' instead of './nvidia-driver-local-repo-ubuntu2404-550.90.07_1.0-1_amd64.deb'
The following packages were automatically installed and are no longer required:python3-cliapp python3-markdown python3-ttystatus python3-zombie-imp
Use'sudo apt autoremove' to remove them.
The following NEW packages will be installed:nvidia-driver-local-repo-ubuntu2404-550.90.07
0 upgraded, 1 newly installed, 0 to remove and 10 not upgraded.
Need to get 0 B/394 MB of archives.
After this operation, 395 MB of additional disk space will be used.
Get:1 /home/pon/Downloads/nvidia-driver-local-repo-ubuntu2404-550.90.07_1.0-1_amd64.deb nvidia-driver-local-repo-ubuntu2404-550.90.07 amd64 1.0-1 [394 MB]
Selecting previously unselected package nvidia-driver-local-repo-ubuntu2404-550.90.07.
(Reading database ... 148381 files and directories currently installed.)
Preparing to unpack .../nvidia-driver-local-repo-ubuntu2404-550.90.07_1.0-1_amd64.deb ...
Unpacking nvidia-driver-local-repo-ubuntu2404-550.90.07 (1.0-1) ...
Setting up nvidia-driver-local-repo-ubuntu2404-550.90.07 (1.0-1) ...The public nvidia-driver-local-repo-ubuntu2404-550.90.07 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/nvidia-driver-local-repo-ubuntu2404-550.90.07/nvidia-driver-local-1844CAD6-keyring.gpg/usr/share/keyrings/Scanning processes...
Scanning processor microcode...
Scanning linux images... Running kernel seems to be up-to-date.The processor microcode seems to be up-to-date.No services need to be restarted.No containers need to be restarted.No user sessions are running outdated binaries.No VM guests are running outdated hypervisor (qemu) binaries on this host.
N: Download is performed unsandboxed as root as file '/home/pon/Downloads/nvidia-driver-local-repo-ubuntu2404-550.90.07_1.0-1_amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
下载 CUDA 并安装
下载地址:https://developer.nvidia.com/cuda-downloads?target_os=Linux
需要下载老版本:https://developer.nvidia.com/cuda-toolkit-archive
我喜欢下载 runfile,因为一些都在里面了,可以一键运行
添加可执行权限
╭─pon@M60GPU ~/Downloads
╰─➤ chmod 777 cuda_12.6.0_560.28.03_linux.run
执行安装
╰─➤ sudo ./cuda_12.6.0_560.28.03_linux.run 130 ↵
[sudo] password for pon:
===========
= Summary =
===========Driver: Installed
Toolkit: Installed in /usr/local/cuda-12.6/Please make sure that- PATH includes /usr/local/cuda-12.6/bin- LD_LIBRARY_PATH includes /usr/local/cuda-12.6/lib64, or, add /usr/local/cuda-12.6/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.6/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log
─➤ nvidia-smi
Tue Aug 27 15:58:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| || MIG M. |
|=========================================+========================+======================|
| 0 Tesla M60 Off | 00000000:83:00.0 Off | 0 |
| N/A 29C P0 40W / 150W | 0MiB / 7680MiB | 0% Default |
| || N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla M60 Off | 00000000:84:00.0 Off | 0 |
| N/A 39C P0 37W / 150W | 0MiB / 7680MiB | 38% Default |
| || N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CIPID Type Process name GPU Memory |
|ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
注意,安装 cuda 的时候,要以 root 或者 sudo 运行,不然会报错如下:
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc[INFO]: gcc version: gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) [INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion (2.22.3)
[WARNING]: Unable to write to directory: /usr/share/applications/
[INFO]: Setup complete
[INFO]: Installing: Driver
[ERROR]: Driver installation must be run as root.
[ERROR]: Install of Driver failed, quitting
更新于 2024-08-29
ubuntu 24.04 安装 Nvidia 显卡驱动 + CUDA + cuDNN,配置 AI 深度学习训练环境
一文解千机于 2024-08-14 17:34:29 发布
ubuntu 24.04 安装 Nvidia 显卡驱动 + CUDA + cuDNN,配置 AI 深度学习训练环境,简单易懂,一看就会!
1. 查看本机显卡型号
lspci | grep -i nvidia
输出如下:
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 920MX] (rev a2)
其中,GeForce 920MX
就是我们的显卡型号。
2. 下载 Nvidia 显卡驱动
官网:点击此处下载 NVIDIA 驱动
根据显卡型号搜索驱动:
选择 查找
下载最新版驱动
3. 安装显卡驱动
安装编译环境
sudo apt update
sudo apt install gcc make
运行安装程序
chmod +x NVIDIA-Linux-x86_64-560.31.02.run
sudo ./NVIDIA-Linux-x86_64-560.31.02.run
选择 “Continue installation”,回车
进入编译中…
注意:
在这里编译时会出现编译错误而中断,原因是下载的显卡驱动是最新的,需要用较高的 gcc 版本编译器来编译,默认 gcc 的版本是 11,小于编译所需要的 12 版本。因此需要安装 12 版本的,并调整 gcc 链接。
sudo apt install gcc-12
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/x86_64-linux-gnu-gcc-12 20
重新运行安装程序:
sudo ./NVIDIA-Linux-x86_64-560.31.02.run
编译通过后,接着运行到下面这里,选择 “Yes”
安装过程中,会提示是否禁用 nouveau
驱动,选择是,NVIDIA 会自动屏蔽 nouveau
驱动,不用手动禁止。
在安装过程中,没有不是特别要求的话,提示选择是否的话,可以都选择 “是”。
终端运行 nvidia-smi
:
输出如下,可以查看到版本号和显存:
安装完成,重启。
4.CUDA 安装
查看显卡支持的 CUDA 版本
nvidia-smi
输出如下:
CUDA 最高可用版本为 12.6
到官网下载 CUDA:https://developer.nvidia.com/cuda-toolkit-archive
注意:CUDA 需要注册登录 NVIDIA 官网才能下载。
选择 12.6 版本以下的都可,根据系统版本选择相应的项:
得到下载地址和运行命令:
这里是:
wget <https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run>
sudo sh cuda_12.2.2_535.104.05_linux.run
这里以 CUDA Toolkit 12.2.2
为例安装:
chmod +x cuda_12.2.2_535.104.05_linux.run
sudo ./cuda_12.2.2_535.104.05_linux.run
选择 “Continue”
输入 “accept”
取消 “Driver” 选项,因为已经安装过显卡驱动了, 这里不需要安装,然后选择 “Install”。
等待安装完成。
配置环境
nano ~/.bashrc
在文件最后添加以下内容:
export PATH=$PATH:/usr/local/cuda-12.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.2/lib64
重载配置
source ~/.bashrc
检查安装
nvcc -V
输出如下,则安装成功:
5. 安装 cuDNN
官网下载:https://developer.nvidia.com/rdp/cudnn-download
选择相应项后会生成下载命令:
wget <https://developer.download.nvidia.com/compute/cudnn/9.3.0/local_installers/cudnn-local-repo-ubuntu2404-9.3.0_1.0-1_amd64.deb>
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.3.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.3.0/cudnn-*-keyring.gpg/usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn
一般最新版本的显卡驱动都能适配最新版本的 cuDNN,若最新版本不适合当前的 CUDA,可以安装历史版本。
cuDNN 最新版本支持的 CUDA 查看:Support Matrix — NVIDIA cuDNN v9.3.0 documentation
查找 cuDNN 支持的 CUDA 的历史版本:
选择 “cuDNN 8.x-1.x”
然后选择适应的版本下载 deb
包安装即可。
Ubuntu 24.04 LTS 如何安装 Nvidia 显卡驱动、CUDA、NVIDIA Container Toolkit 套件
zhangfangzhou 2025 - 03 - 11
1. 安装 Nvidia 显卡驱动
若有 Nvidia 显卡,Ubuntu 系统会安装开源的 nouveau 驱动,用指令 sudo lshw -C display
确认,driver 区域会显示 “nouveau”。
# 卸载自带的驱动
sudo apt update
sudo apt upgrade
sudo apt purge *nvidia*
使用 ubuntu - drivers list
指令列出目前 Nvidia 显示卡可用的驱动版本。
# 让 Ubuntu 自动挑选推荐的驱动版本
sudo ubuntu - drivers install# 或者手动指定版本,填入要安装的 Nvidia 驱动版本号。
sudo ubuntu - drivers install nvidia : 570
安装后 nouveau 应会自动加入黑名单禁止加载。接着重新启动,用 sudo lshw -C display
确认是否安装成功,driver 区域应会显示 “nvidia”。
2. 双 GPU 显卡笔记本电脑
像 Intel + Nvidia 这种的双 GPU 笔记本电脑,即使装了 Nvidia 驱动也可能继续用 Intel 的 GPU 渲染 3D,导致 3D 性能低下。
此时可以使用 prime - select
指令,指定用 Nvidia 显示卡负责渲染。
sudo prime - select nvidia
重开机后再使用指令:vulkaninfo --summary
查看主显示卡为何。
3. Ubuntu 安装 CUDA,CUDA Toolkit Installer
Installation Instructions :
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1 - 1_all.deb
sudo dpkg -i cuda-keyring_1.1 - 1_all.deb
sudo apt - get update
sudo apt - get - y install cuda - toolkit - 12 - 8
用 nvcc --version
确认 CUDA 的版本,如果显示 Command nvcc not found,则编辑 ~/.bashrc
vim ~/.bashrc
export PATH = /usr/local/cuda - 12.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH = /usr/local/cuda - 12.8/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
更新变量
source ~/.bashrc
nvcc --version
nvcc : NVIDIA (R) Cuda compiler driver
Copyright (c) 2005 - 2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools , release 12.8 , V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
4. 安装 NVIDIA Container Toolkit
这是设计给 Docker 和 Podman 容器用的 Nvidia 工具,使容器可以使用 CUDA 计算
即使宿主机没有安装 CUDA,容器内照样可以使用 CUDA 计算,方便你在容器里面跑不同版本的 CUDA,不会受到宿主机的 CUDA 版本影响。
必须先安装 Nvidia 专有驱动才可以安装 NVIDIA Container Toolkit。
(1)在 Ubuntu 安装 Docker
(2)加入 NVIDIA Container Toolkit 的套件库
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \sed 's#deb https://#deb [signed - by = /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
安装 NVIDIA Container Toolkit
sudo apt update
sudo apt install nvidia-container-toolkit
向 Docker 注册 Nvidia
sudo nvidia - ctk runtime configure --runtime = docker
重新启动 Docker
sudo systemctl restart docker
执行 Ubuntu 容器,测试能否出现 Nvidia 显卡的信息
sudo docker run --rm --runtime = nvidia --gpus all ubuntu nvidia - smi
5. 安装 TensorRT,TensorRT 是 Nvidia 推出的深度学习推理平台
必须先安装 CUDA 才能安装 TensorRT。https://developer.nvidia.com/nvidia-tensorrt-download
安装 TensorRT 的 deb 档,加入套件库
# 指定系统版本
os = "ubuntu2204"# 指定 TensorRT 版本
tag = "10.5.0.x - 1 + cuda12.6"sudo dpkg -i nv - tensorrt - local - repo - ${os} - ${tag}_1.0 - 1_amd64.deb
sudo cp /var/nv - tensorrt - local - repo - ${os} - ${tag}/* - keyring.gpg /usr/share/keyrings/
sudo apt update
安装 TensorRT
sudo apt install tensorrt
Ubuntu18.04LTS 安装 NVIDIA 驱动详细完整过程
fengyuechengshi495 于 2018-12-14 19:08:57 发布
写在前面的话
上一篇博客讲了我在 Win10 系统下安装了 Ubuntu 18.04LTS,在安装之前有一步是禁用 Nouveau 的驱动,因为在安装 Linux 双系统(一般是 Nvidia 显卡,1080p 加核显的电脑)时,经常会出现卡在安装 logo 的问题,这种原因一般是由于 linux 发行商收录的 nouveau 的开源显卡的问题。但是我装好系统之后一直没有去改善这个问题,没有规范的安装 NVIDIA 驱动,导致我的电脑昨天出现了在登陆界面即使输对密码也进不去的问题,所以现在把我今天安装 NIVIDIA 驱动的完整过程整理出来供大家参考!
安装前
安装前首先需要禁用 Nouveau 的驱动,上一步的改动只是在安装的时候临时禁用。如果没有永久禁用该驱动,可能会出现安装完毕 NIVIDA 显卡后无法进入 Ubuntu 的情况 (在登录界面,输入密码也无法登录)。
需要在 grub 的配置文件里面更改:
在终端中输入命令:$ sudo gedit /boot/grub/grub.cfg
而后在文本中查找 quiet splash 然后添加 acpi_osi=linux nomodeset,保存文本即可。如下图:
安装中
首先,检测你的 NVIDIA 显卡型号和推荐的驱动程序的模型。在命令行中输入如下命令:
$ ubuntu-drivers devices
如图:
从输出结果可以看到,目前系统已连接 Nvidia GeForce GTX 1050 Ti 显卡,建议安装驱动程序是 nvidia-415 版本的驱动。
之后就是安装驱动过程,由于我是新装的系统,所以我直接进行了全部软件和驱动的更新,其中就包括 NVIDIA 驱动的安装,步骤如下:
首先打开软件和更新,在第一个选项页中选择前两个选项,如图:
而后在第三个选项页中也选择上面的两个选项,如图:
之后在终端依次运行命令:
$ sudo apt-get update
$ sudo apt-get upgrade
之后系统就开始更新软件和驱动,整个过程大约十分钟,耐心等待就好。
当到了安装 NVIDIA 驱动的时候,系统会跳到一个粉白色的 config 界面(我截图了但是系统没保存不知道怎么回事),这个界面需要你设置一个密码,要求 8-16 位,我就设了一个简单的 “123456789”,而后就又回到了刚才的终端的界面。当更新操作完成后,需要重启系统来完成安装。执行命令:
$ sudo reboot
在开机时会出现蓝屏的画面这个不用担心,是正常的安装画面,这时候一定要注意,要按方向键选择Enroll MOK,如图:
确认后在下一个选项中选择continue,然后选择Yes,然后会出现一个输入密码的界面,只要输入之前设置好的密码继续就可完成安装,我当时设置的是 “123456789”,输密码界面如图:
之后选择reboot选项,重启即正式完成 NVIDIA 驱动的安装。
安装后
再次开机后可以通过以下命令验证是否安装成功:
$ nvidia-smi
如果出现 GPU 列表,则驱动安装成功,如图:
PS:附上我安装成功的截图来抚慰一下我这一天六点起床的辛勤劳动。
Ubuntu22.04 显卡驱动与内核版本不一致解决方案
柃歌已于 2024-06-07 12:16:30 修改
由于没有关闭 Ubuntu 的自动更新,有时候在使用 GPU 服务器时执行 nvidia-smi
会遇到以下报错:
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.161
这说明由于自动更新导致了 Nvidia 显卡驱动与内核版本不一致,我们可以查看一下内核版本:
cat /proc/driver/nvidia/version
显示结果如下:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.154.05 Thu Dec 28 15:37:48 UTC 2023
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
因此显卡驱动的版本 535.161
比内核版本 535.154.05
更高,遇到这种情况可能重启服务器后即可恢复正常,如果没有重启条件则按以下方法尝试解决。
1. 卸载内核驱动
先卸载当前内核驱动:
sudo rmmod nvidia
此时可能会遇到报错:rmmod: ERROR: Module nvidia is in use by: nvidia_uvm nvidia_modeset
,需要先卸载依赖:
sudo rmmod nvidia_uvm
sudo rmmod nvidia_modeset
卸载第二个依赖时会提示:rmmod: ERROR: Module nvidia_modeset is in use by: nvidia_drm
,根据提示继续卸载依赖即可:
sudo rmmod nvidia_drm
如果遇到报错:rmmod: ERROR: Module nvidia_drm is in use
,需要进行以下操作:
sudo lsof -n -w /dev/nvidia* # 查看哪些进程使用了 nvidia*
sudo kill <ID> # 结束相关进程
lsmod | grep nvidia # 查看 kernel mod 的依赖情况
sudo systemctl isolate multi-user.target # 隔离多用户
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
最后重新查看一下 GPU 信息:
nvidia-smi
2. 重装显卡驱动
卸载当前显卡驱动:
sudo apt-get purge nvidia*
然后查找可用的驱动版本:
ubuntu-drivers devices
结果如下:
== /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:0c.0/0000:06:00.0 ==
modalias : pci:v000010DEd00001B02sv000010DEsd000011DFbc03sc00i00
vendor : NVIDIA Corporation
model : GP102 [TITAN Xp]
driver : nvidia-driver-545 - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-535 - third-party non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-470 - distro non-free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
安装对应版本驱动:
sudo apt-get update
sudo apt-get install nvidia-driver-535-server
最后重新查看一下 GPU 信息:
nvidia-smi
NVIDIA 驱动版本与 Linux 内核版本不一致处理办法
WelchWang21 已于 2024-09-24 09:11:28 修改
背景
以 Ubuntu20.04 为例,Linux 内核版本被自动升级至 5.15.0-122-generic,原 Linux 内核版本 5.15.0-105-generic,Nvidia 驱动版本 535.154.05。
1、查看当前已安装的 linux 内核
查看当前正在使用的内核版本
uname -r
查看已安装的系统内核引导
dpkg --get-selections | grep linux-headers
dpkg --list|grep linux-headers
查看已安装的系统内核镜像
dpkg --get-selections | grep linux-image
dpkg --list|grep linux-image
2、查看显卡驱动版本
nvidia-smi
3、原因分析
linux 联网后,会不定期自动更新内核,所以才会存在 nvidia 驱动版本与 linux 内核版本不一致,这时要么安装最新版本的 nvidia 驱动、要么降低 linux 内核版本。
4、方法一:安装最新版本的 nvidia 驱动
到 nvidia 官网自行下载最新驱动:最新官方 NVIDIA 驱动
5、方法二:降低 linux 内核版本
5.1 安装旧版 linux 内核
在有网环境下,可使用以下步骤安装旧版内核
sudo apt-get install linux-image-5.15.0-105-generic
sudo apt-get install linux-headers-5.15.0-105-generic
无网环境,请从有网的相同配置备机下载相关离线包后传入本机安装(离线包下载及安装参考此链接 ubuntu apt 一键下载所有依赖包 - CSDN 博客),离线包如下:
linux-headers-5.15.0-105-generic_5.15.0-105.115~20.04.1_amd64.deb
linux-image-5.15.0-105-generic_5.15.0-105.115~20.04.1_amd64.deb
可能需要的依赖包:
linux-hwe-5.15-headers-5.15.0-105_5.15.0-105.115~20.04.1_all.deb
linux-image-unsigned-5.15.0-105-generic_5.15.0-105.115~20.04.1_amd64.deb
linux-modules-5.15.0-105-generic_5.15.0-105.115~20.04.1_amd64.deb
5.2 切换默认内核,更新一下启动引导
查看引导名称
cat /boot/grub/grub.cfg |grep menuentry
修改引导
vi /etc/default/grubGRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.15.0-105-generic"
更新引导
sudo update-grub
重启电脑生效
sudo reboot
查看内核变更是否生效
uname -r
5.3 禁止内核更新
sudo apt-mark hold linux-image-5.15.0-105
sudo apt-mark hold linux-image-5.15.0-105-generic
echo "linux-image-5.15.0-105-generic hold" | dpkg --set-selections
5.4 删除高版本内核
删除不需要的内核,注意不要删除当前正在使用的内核,否则会导致系统无法登录。
sudo apt-get purge linux-image-5.15.0-122
sudo apt-get purge linux-image-5.15.0-116
...
5.5 卸载驱动,重新安装
卸载旧驱动
sudo /usr/bin/nvidia-uninstallsudo apt-get --purge remove nvidia-*sudo apt-get purge nvidia*sudo apt-get purge libnvidia*
查看是否卸载干净
sudo dpkg --list | grep nvidia-*
重新安装驱动
sh NVIDIA-Linux-x86_64-535.154.05.run
参考链接:
linux 卸载内核 - CSDN 博客
Linux 内核卸载和禁止更新 - CSDN 博客
GPU 驱动安装,CUDA 安装相关问题_there appears to already be a driver installed on -CSDN 博客
解决 Ubuntu 18.04 安装 nvidia 显卡驱动,导致内核不匹配:无需重装系统修复内核
liyiersan123 已于 2024-03-27 16:00:08 修改
一、问题描述
昨天更新 Ubuntu 18.04 的显卡驱动,以支持更新版本的 CUDA 和 Pytorch。结果在安装新版本驱动的时候,显卡驱动和系统内核版本不一致,导致进不去系统了。后来参考解决 Ubuntu 显卡驱动升级导致的显卡驱动和内核版本不匹配的问题,非但没有解决问题,反而把系统里面原本的内核损坏了。这个博客中的写得巨烂,首先排版就很烂,其次注意事项也不写清楚。事实上,对于 Linux 内核,最好别去动它。一个内核文件也不大,几百兆,就留着也不占什么空间,没有必要卸载旧的内核。装新的内核之后,配置启动方式就行了。这下完蛋了,重启之后彻底进不去系统了。
二、尝试修复
(一)尝试在线安装内核
进入终端模式(ctrl+alt+F1
),尝试使用 sudo apt install
按照我以前的教程安装指定版本(5.11.0-34)的内核,结果一直失败。并且由于系统语言设置的中文,在终端模式下报错信息一直是乱码,完全没办法定位错误。事后复盘,我发现 5.11.0-34 的内核版本是针对 Ubuntu 20.04 LTS 的,所以 Ubuntu 18.04 用 apt install
就会找不到匹配的软件包。
(二)尝试离线安装内核
既然是内核的问题,还有一种方法是离线安装内核,下载指定版本内核相关文件到本地,然后安装。但是由于此前内核损坏,进系统的时候出现了 bus error,导致无法识别插入的 U 盘。这个方法也行不通。
(三)修复报错信息显示及安装内核
所以这个时候,我们需要正常显示报错信息和提示,这样我们才方便定位排查错误。具体做法可以参考:Ubuntu 系统误删内核后修复方法。流程总结如下:
- 准备一个空的 U 盘,下载 Ubuntu 镜像制作启动盘。这里推荐使用国产开源软件 ventoy。ventoy 超级好用,有了它就无需反复格式化 U 盘,可以同时保存多个不同类型的镜像文件,并且正常使用 U 盘。
- 更改 bios 启动方式,进入时选择 “try Ubuntu without installing”。
- 在进入的 Ubuntu 系统内挂载原系统根目录,boot 目录和必要的系统目录。这一步需要用到
fdisk -l
和mount
命令。 - 使用
chroot
命令更改根目录到挂载的原系统根目录,这样就可以进入原系统了。 - 在原系统内使用
apt
(在线安装)或者dpkg
(离线安装)命令安装新的内核。 - 安装完成之后,使用
sudo update-grub
来更新一下引导。 - 重启,进入 Ubuntu 高级选项,选择安装的新内核,即可进入系统。
- 进入系统后,别忘了更改默认启动内核。具体方法可以参考我之前的教程,提供了三种方法修改启动内核。
在这个过程中,我遇到的坑如下:
-
在挂载原系统目录时,根目录在一个分区上,
/usr
目录在另一个分区上。一开始我只挂载了根目录所在分区,没有挂载/usr
目录的分区。导致我使用chroot
命令之后,一直使用不了apt
和dpkg
命令。后来,询问 GPT-4 才发现问题所在。(GPT is all you need!) -
使用
apt-get install linux-image-generic
时,下载安装的内核版本是 4.15.0,这个版本也比较老了。我通过这个内核进入系统之后发现无法上网,没有深究。后来通过离线方式安装了 5.15.0 的内核版本,离线安装的方式可以参考 Linux 系统 - Ubuntu 安装指定版本的内核。需要注意的是,下载的文件应该是不带 low-latency 的四个文件,而不是参考教程里面说的 3 个,可参考 GPT 给出的建议。具体安装流程总结如下:- 选择内核版本的时候,不能太高也不能太低,亲测 5.8.0 可用。如果版本太高,例如:5.15.0 版本,可能因为依赖问题(5.15.0 的内核头文件需要
libc6>= 2.34
,但是 Ubuntu 18.04 里面libc6
版本为 2.27)无法安装 headers 文件,导致后续安装显卡驱动的时候报错。ERROR: Unable to find the kernel source tree for the currently running kernel.
出现这个错误就是说明没有安装成功 headers 文件。关于这一点,有个博客写得蛮好的。不过,他的解决方式在我的环境里面没有 work。我是直接手动安装了 5.8.0 的 headers 才成功。
- 选择内核版本的时候,不能太高也不能太低,亲测 5.8.0 可用。如果版本太高,例如:5.15.0 版本,可能因为依赖问题(5.15.0 的内核头文件需要
三、安装 Nvidia 驱动和 CUDA 并配置 cuDNN
安装驱动的时候,记得加上 --no-x-check --no-nouveau-check --no-opengl-files
。
这里需要注意一下,修改 /etc/profile
文件来添加 cuda 的环境变量时,一定要仔细检查,不然出啥问题了,在进系统时会出现输入密码循环登录的情况。解决方案也很简单,进入终端模式,输入用户名,密码,把 /etc/profile
修改回去,再重启即可。如果是个人用户使用,在 ~/.bashrc
这种个人级别的配置文件添加 cuda 路径是最好的。
四、总结
折腾上面这些东西,花了我一个下午 + 一个晚上。中途一度想放弃重装系统,但是数据丢失的代价太大了,还是坚持下来了。以后可以多问问 GPT,把情况描述清楚,GPT-4 给出的答案都很靠谱,基本和自己上网搜的解决方案差不多,可以显著提升效率。
via:
-
解决 Ubuntu 显卡驱动升级导致的 显卡驱动和内核版本不匹配的问题_ubuntu 更改显卡驱动后内核读不到根目录 - CSDN 博客
https://blog.csdn.net/liuyang_xyz/article/details/120684134 -
Ubuntu22.04 显卡驱动与内核版本不一致解决方案_ubuntu 内核和显卡驱动对应版本 - CSDN 博客
https://blog.csdn.net/m0_51755720/article/details/139511177 -
linux - 在 ubuntu24.04 上使用英伟达显卡 —— 安装驱动和 cuda - SegmentFault 思否
https://segmentfault.com/a/1190000045196008- 给 linux 的 NVIDIA GPU 安装 CUDA Toolkit - SegmentFault 思否
https://segmentfault.com/a/1190000044229852 - cuda - nv 显卡安装驱动以及周边日志 - SegmentFault 思否
https://segmentfault.com/a/1190000043955289
- 给 linux 的 NVIDIA GPU 安装 CUDA Toolkit - SegmentFault 思否
-
Ubuntu 24.04 LTS 如何安装 Nvidia 显卡驱动、CUDA、NVIDIA Container Toolkit 套件 – 方舟笔记
https://www.zhangfangzhou.cn/ubuntu-24-04-lts-install-nvidia-cuda-nvidia-container-toolkit.html -
ubuntu 24.04 安装 Nvidia 显卡驱动 + CUDA + cuDNN,配置 AI 深度学习训练环境,简单易懂,一看就会!- CSDN 博客
https://blog.csdn.net/u010912615/article/details/141195878 -
深度好文:解决 Ubuntu 18.04 安装 nvidia 显卡驱动,导致内核不匹配:无需重装系统修复内核_ubuntu 内核与显卡驱动不兼容 - CSDN 博客
https://blog.csdn.net/weixin_42364196/article/details/137080030 -
Ubuntu22.04 显卡驱动与内核版本不一致解决方案_ubuntu 内核和显卡驱动对应版本 - CSDN 博客
https://blog.csdn.net/m0_51755720/article/details/139511177 -
NVIDIA 驱动版本与 Linux 内核版本不一致处理办法_内核版本和 nvidia-CSDN 博客
https://blog.csdn.net/WelchWang21/article/details/142450637 -
ubuntu18.04 安装 nvidia 显卡驱动_ubuntu18.04 安装 nvidia 显卡驱动 - CSDN 博客
https://blog.csdn.net/weixin_44583856/article/details/120909281 -
Ubuntu18.04 安装 Nvidia 驱动【全网不坑,超全步骤】(亲测~)_ubuntu18.04 安装 nvidia 显卡驱动 - CSDN 博客
https://blog.csdn.net/weixin_44348719/article/details/125049064 -
ubuntu18.04 安装 NVIDIA 驱动的心酸(失败)经历及解决方法(换系统成功)_sudo apt-get purge nvidia*-CSDN 博客
https://blog.csdn.net/xiaojinger_123/article/details/120888777_ -
超详细!Ubuntu 18.04 安装 NVIDIA 显卡驱动 - 星河赵 - 博客园(XUNGE’s Blog)
https://www.cnblogs.com/zhaoyingjie/p/15380694.html -
【傻瓜教程】Ubuntu18.04LTS 安装 NVIDIA 驱动详细完整过程_nvidia 截图怎么安装 - CSDN 博客
https://blog.csdn.net/fengyuechengshi495/article/details/85008398#commentBox- Ubuntu 18.04 NVIDIA 驱动安装总结_ubuntu18.04nvidia 安装 - CSDN 博客
https://blog.csdn.net/tjuyanming/article/details/80862290 - Ubuntu 18.04 安装 NVIDIA 显卡驱动教程_ubuntu18.04 安装 nvidia 显卡驱动 - CSDN 博客
https://blog.csdn.net/new_delete_/article/details/81544438 - 解决 Linux 双系统安装卡在启动 Logo<_麒麟双系统 卡 logo-CSDN 博客
https://blog.csdn.net/tjuyanming/article/details/79267984 - E: 无法修正错误,因为您要求某些软件包保持现状 ubuntu 18_yum : 依赖: python-rpm 但是它将不会被安装 e: 无法修正错误,因为您要求某些软件 - CSDN 博客
https://blog.csdn.net/wfivenx/article/details/82121404
- Ubuntu 18.04 NVIDIA 驱动安装总结_ubuntu18.04nvidia 安装 - CSDN 博客
-
Ubuntu 系统手动安装 nvidia 显卡驱动全流程_nvidia proprietary-CSDN 博客
https://blog.csdn.net/2402_83234606/article/details/145950672