在服务器上安装AlphaFold2遇到的问题(1)
犯了错误,轻信deepseek,误将cuDNN8.9.7删掉
[root@localhost ~]# cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 9
#define CUDNN_PATCHLEVEL 7
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)/* cannot use constexpr here since this is a C-only file */
[root@localhost ~]# ldconfig -p | grep libcudnn.so.8libcudnn.so.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.8libcudnn.so.8 (libc6,x86-64) => /lib64/libcudnn.so.8
[root@localhost ~]# export LD_PRELOAD=/usr/local/cuda/lib64/libcudnn.so.8
[root@localhost ~]# cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 9
#define CUDNN_PATCHLEVEL 7
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)/* cannot use constexpr here since this is a C-only file */
[root@localhost ~]# cat /usr/include/cudnn_version.h
cat: /usr/include/cudnn_version.h: 没有那个文件或目录
#???
cat /usr/include/cudnn.h
[root@localhost ~]# ls /usr/lib64/libcudnn.so*
/usr/lib64/libcudnn.so.8
[root@localhost ~]# dnf list installed | grep cudnn
cudnn-local-repo-rhel8-9.10.0.x86_64 1.0-1 @System
剁手!!!
[root@localhost ~]# dnf remove -y libcudnn* libcudnn8* libcudnn-devel*
未找到匹配的参数: libcudnn*
未找到匹配的参数: libcudnn8*
未找到匹配的参数: libcudnn-devel*
没有软件包需要移除。
依赖关系解决。
无需任何处理。
完毕!
[root@localhost ~]# rm -f /usr/local/cuda/include/cudnn*.h
[root@localhost ~]# rm -f /usr/local/cuda/lib64/libcudnn*
[root@localhost ~]# ldconfig
[root@localhost ~]# find / -name "*cudnn*" 2>/dev/null
/home/Softwares/AlphaFold2/cudnn-local-repo-rhel8-9.10.0-1.0-1.x86_64.rpm
/home/Softwares/AlphaFold2/cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
/home/Softwares/AlphaFold2/cudnn-linux-x86_64-8.9.7.29_cuda12-archive
…………………………
/var/cache/PackageKit/8.7/hawkey/cudnn-local-rhel8-9.10.0.solv
/var/cache/PackageKit/8.7/hawkey/cudnn-local-rhel8-9.10.0-filenames.solvx
/var/cache/dnf/cudnn-local-rhel8-9.10.0-903bc33f34604e66
/var/cache/dnf/cudnn-local-rhel8-9.10.0.solv
/var/cache/dnf/cudnn-local-rhel8-9.10.0-filenames.solvx
/var/cudnn-local-repo-rhel8-9.10.0
/var/cudnn-local-repo-rhel8-9.10.0/cudnn-9.10.0-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn-jit-9.10.0-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn-local-D3C757D7-keyring.gpg
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-9.10.0-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-cuda-11-8-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-cuda-12-9-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-jit-9.10.0-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-jit-cuda-11-8-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-jit-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-jit-cuda-12-9-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/cudnn9-jit-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-devel-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-devel-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-headers-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-headers-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-jit-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-jit-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-jit-devel-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-jit-devel-cuda-12-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-samples-9.10.0.56-1.noarch.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-static-cuda-11-9.10.0.56-1.x86_64.rpm
/var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-static-cuda-12-9.10.0.56-1.x86_64.rpm
1. 文件来源:
这些文件通常是您通过 NVIDIA 官方 .rpm 包安装 cuDNN 时自动生成的
/var/cudnn-local-repo-* 目录包含本地仓库信息和安装包缓存
2. 文件作用:
cudnn-local-repo-rhel8-9.10.0 是本地仓库配置目录
cudnn-9.10.0-1.x86_64.rpm 是下载的原始安装包
可以安全删除,因为:
这些只是安装过程中的缓存文件和仓库配置
删除不会影响已经安装到系统目录的 cuDNN 库文件
如果需要重新安装,可以从 NVIDIA 官网重新下载
安全删除方法
#删除整个本地仓库目录
sudo rm -rf /var/cudnn-local-repo-rhel8-9.10.0
#也可以选择性只删除 RPM 包
sudo rm -f /var/cudnn-local-repo-rhel8-9.10.0/cudnn-9.10.0-1.x86_64.rpm
删除后建议
- 如果您计划未来重新安装相同版本:
可以保留 .rpm 文件以便重复使用
但通常建议从官网下载最新版本- 清理后重建仓库缓存:
sudo dnf clean all
sudo dnf makecache
- 验证已安装的 cuDNN 是否仍然工作:
ldconfig -p | grep libcudnn
其他类似可删除的文件
同样可以安全删除的类似文件包括:
/var/cuda-repo-* 目录
/var/nvidia-driver-* 目录
任何 /var/ 下以 -repo- 命名的目录
这些都是在安装 NVIDIA 相关软件时创建的临时仓库目录,删除不会影响已安装的软件功能。
因为上面剁手了所以这里出现问题了
[root@localhost ~]# ls -l /var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-static-cuda-12-9.10.0.56-1.x86_64.rpm
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
-rw-r--r--. 1 root root 749479982 5月 5 13:25 /var/cudnn-local-repo-rhel8-9.10.0/libcudnn9-static-cuda-12-9.10.0.56-1.x86_64.rpm
[root@localhost ~]# ls -l /home/Softwares/AlphaFold2/cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn_version.h
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
-rw-r--r--. 1 25503 2174 4019 11月 30 2023 /home/Softwares/AlphaFold2/cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn_version.h
[root@localhost ~]# cd /home/
[root@localhost home]# ls
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Softwares user wulu1 yhm
[root@localhost home]# cd Softwares/
[root@localhost Softwares]# ls
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
AlphaFold2
[root@localhost Softwares]# cd AlphaFold2/
[root@localhost AlphaFold2]# ls
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
cuda-repo-rhel8-12-9-local-12.9.0_575.51.03-1.x86_64.rpm cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
cudnn-linux-x86_64-8.9.7.29_cuda12-archive cudnn-local-repo-rhel8-9.10.0-1.0-1.x86_64.rpm
[root@localhost AlphaFold2]# cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive/
[root@localhost ~]# ldconfig -p | grep cudnn
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
想注释环境变量来着
[root@localhost ~]# vi /ect/profile
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[root@localhost ~]# vi ~/.bashrc
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[root@localhost ~]# vi /etc/profile
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[root@localhost ~]# rm -rf /var/cudnn-local-repo*
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
这个错误信息看似与删除操作无关,实际上是系统在尝试加载一个已经被删除或不可用的 CUDA/cuDNN 库文件。
错误原因分析
表面现象:当执行 rm 命令时,系统显示无法预加载 libcudnn.so.8
这不是删除操作本身的错误,而是 LD_PRELOAD 环境变量配置问题
根本原因:
某些程序(可能是之前安装的 CUDA 相关工具)在环境变量中设置了 LD_PRELOAD=/usr/local/cuda/lib64/libcudnn.so.8
1. 临时解决方案(当前会话有效)
# 取消 LD_PRELOAD设置 unset LD_PRELOAD
# 然后重新执行删除命令 rm -rf /var/cudnn-local-repo*
2. 永久解决方案(推荐)
# 检查哪些配置文件设置了 LD_PRELOAD grep -r "LD_PRELOAD" /etc/ ~/.bash* ~/.profile* /etc/profile.d/
# 找到包含该设置的文件后,编辑文件删除或注释相关行 sudo nano /etc/profile #或其他查找到的文件 #注释掉类似这样的行:export LD_PRELOAD=/usr/local/cuda/lib64/libcudnn.so.8
# 使更改生效 source ~/.bashrc # 或其他被修改的文件
3. 如果文件确实需要保留
如果 libcudnn.so.8 是其他程序必需的:
# 重新安装正确的库文件 sudo dnf install libcudnn8
# 或创建正确的符号链接 sudo ln -s /usr/local/cuda/lib64/libcudnn.so.8 /lib64/ sudo ldconfig
现在这个库文件已被删除或路径变更,导致系统无法加载
注释掉环境变量后,反而报了一大堆错误
[root@localhost ~]# source /etc/profile
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
这里应该是取消注释环境变量
[root@localhost ~]# vi /etc/profile
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[root@localhost ~]# ls
ERROR: ld.so: object '/usr/local/cuda/lib64/libcudnn.so.8' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
公共 模板 视频 图片 文档 下载 音乐 桌面 anaconda-ks.cfg initial-setup-ks.cfg NVIDIA-Linux-x86_64-550.144.03.run
[root@localhost ~]# unset LD_PRELOAD
[root@localhost ~]# ls
公共 模板 视频 图片 文档 下载 音乐 桌面 anaconda-ks.cfg initial-setup-ks.cfg NVIDIA-Linux-x86_64-550.144.03.run
[root@localhost ~]# rm -rf /var/cudnn-local-repo*
[root@localhost ~]# ls
公共 模板 视频 图片 文档 下载 音乐 桌面 anaconda-ks.cfg initial-setup-ks.cfg NVIDIA-Linux-x86_64-550.144.03.run
[root@localhost ~]# cd /home/
[root@localhost home]# ls
Softwares user wulu1 yhm
[root@localhost home]# cd Softwares/
[root@localhost Softwares]# ls
AlphaFold2
[root@localhost Softwares]# cd AlphaFold2/
[root@localhost AlphaFold2]# ls
cuda-repo-rhel8-12-9-local-12.9.0_575.51.03-1.x86_64.rpm cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
cudnn-linux-x86_64-8.9.7.29_cuda12-archive cudnn-local-repo-rhel8-9.10.0-1.0-1.x86_64.rpm
手动删除残余文件
[root@localhost Softwares]# rm -rf /etc/yum.repos.d/cudnn-local-rhel8-9.10.0.repo
[root@localhost Softwares]# rm -rf /var/cache/PackageKit/8.7/hawkey/cudnn-local-rhel8-9.10.0.solv
[root@localhost Softwares]# rm -rf /var/cache/PackageKit/8.7/hawkey/cudnn-local-rhel8-9.10.0-filenames.solvx
[root@localhost Softwares]# rm -rf /var/cache/dnf/cudnn-local-rhel8-9.10.0-903bc33f34604e66
[root@localhost Softwares]# rm -rf /var/cache/dnf/cudnn-local-rhel8-9.10.0.solv
[root@localhost Softwares]# rm -rf /var/cache/dnf/cudnn-local-rhel8-9.10.0-filenames.solvx
文件路径类型是否可删除备注
/etc/yum.repos.d/cudnn-local-rhel8-9.10.0.repo | 仓库配置文件 | ✅ 可删除 | 删除后需通过 dnf clean all 更新缓存 |
---|---|---|---|
/var/cache/PackageKit/8.7/hawkey/cudnn-local-rhel8-9.10.0.solv | 包管理器缓存 | ✅ 可删除 | PackageKit 的解决依赖缓存 |
/var/cache/PackageKit/8.7/hawkey/cudnn-local-rhel8-9.10.0-filenames.solvx | 包管理器缓存 | ✅ 可删除 | 文件名索引缓存 |
/var/cache/dnf/cudnn-local-rhel8-9.10.0-903bc33f34604e66 | DNF 缓存目录 | ✅ 可删除 | 包含下载的元数据 |
/var/cache/dnf/cudnn-local-rhel8-9.10.0.solv | DNF 依赖解析缓存 | ✅ 可删除 | 二进制格式的依赖关系数据 |
/var/cache/dnf/cudnn-local-rhel8-9.10.0-filenames.solvx | DNF 文件名索引 | ✅ 可删除 | 加速包搜索的索引文件 |
from deepseek
[root@localhost Softwares]# cd AlphaFold2/
[root@localhost AlphaFold2]# ls
cuda-repo-rhel8-12-9-local-12.9.0_575.51.03-1.x86_64.rpm cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
cudnn-linux-x86_64-8.9.7.29_cuda12-archive cudnn-local-repo-rhel8-9.10.0-1.0-1.x86_64.rpm
[root@localhost AlphaFold2]# cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive/
[root@localhost cudnn-linux-x86_64-8.9.7.29_cuda12-archive]# ls
include lib LICENSE
验证是否清理干净
# 检查仓库配置
ls /etc/yum.repos.d/ | grep -i cudnn
# 检查缓存文件
ls /var/cache/{dnf,PackageKit}/* | grep -i cudnn
[root@localhost cudnn-linux-x86_64-8.9.7.29_cuda12-archive]# ls /etc/yum.repos.d/ | grep -i cudnn
[root@localhost cudnn-linux-x86_64-8.9.7.29_cuda12-archive]# ls /var/cache/{dnf,PackageKit}/* | grep -i cudnn
这些文件删除后,系统会更加干净,不会影响其他CUDA/cuDNN组件的运行。如果未来需要重新安装相同版本,只需重新下载官方安装包即可恢复仓库配置。
[root@localhost cudnn-linux-x86_64-8.9.7.29_cuda12-archive]# dnf clean all
31 文件已删除
[root@localhost cudnn-linux-x86_64-8.9.7.29_cuda12-archive]# dnf makecache
Rocky Linux 8 - AppStream 1.5 MB/s | 18 MB 00:11
Rocky Linux 8 - BaseOS 4.0 MB/s | 23 MB 00:05
Rocky Linux 8 - Extras 13 kB/s | 15 kB 00:01
cuda-rhel8-12-9-local 81 MB/s | 121 kB 00:00
元数据缓存已建立。
完整删除 cuDNN 残留的步骤
- 先修复 LD_PRELOAD 问题:
unset LD_PRELOAD
- 然后删除目标文件:
sudo rm -rf /var/cudnn-local-repo*
- 清理其他残留:
sudo rm -f /usr/local/cuda/lib64/libcudnn*
sudo rm -f /usr/include/cudnn.h
sudo ldconfig
验证删除结果
#检查文件是否已删除
ls /var/cudnn-local-repo* 2>/dev/null
#检查库加载是否正常
ldconfig -p | grep cudnn