bitsandbytes 报错心得
我的环境
cuda=12.4
python=3.8
torch=2.4.0+cu124
fromhttps://download.pytorch.org/whl/
torchaudio==2.4.0+cu124
fromhttps://download.pytorch.org/whl/
torchvision==0.19.0+cu124
fromhttps://download.pytorch.org/whl/
transformers=4.36.2
peft=0.13.2
tokenizers=0.15.2
bitsandbytes=0.42.0
accelerate=0.28.0
报错概述
ERROR 显示如下:
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
ERROR 信息没什么价值,忽略即可,基本都是报这个错误。
主要看最开始的 warning:
/data/XXX/miniconda3/envs/cqr38/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167:
UserWarning: /data/XXX/miniconda3/envs/cqr38 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lib/wsl/lib')}
/data/lujiarui/miniconda3/envs/cqr38/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167:
UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so')}..
We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...
OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
/data/lujiarui/miniconda3/envs/cqr38/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167:
UserWarning: /usr/local/cuda-12.4/targets/x86_64-linux/lib:/usr/local/cuda/lib64:/usr/lib/wsl/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/lib/wsl/lib:/usr/local/cuda/lib64:/opt/OpenBLAS/lib:/monchickey/ffmpeg/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
解决方法
第一个 warning: /data/XXX/miniconda3/envs/cqr38 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected!
,含义是在虚拟环境 cqr38 中没有找到 ‘libcudart.so’, ‘libcudart.so.11.0’, ‘libcudart.so.12.0’ 中的一个,所以解决办法就是让 bitsandbytes 在这个路径中找到这仨文件(之一)。
我的做法主要是三步:① 找到 libcudart.so 路径;② 在虚拟环境中创建关于该 libcudart 的软连接;③ 配置环境变量。
① 查找命令 sudo find / -name "libcudart.so"
,看到类似于或前缀为 /usr/local/cuda/
的路径下有 "libcudart.so"
,假设找到的路径为 /usr/local/cuda/lib64/lib/cudart.so
。进入 /usr/local/cuda/lib64/lib/
可以看到
lrwxrwxrwx. 1 root root 15 Mar 6 23:39 libcudart.so -> libcudart.so.12
lrwxrwxrwx. 1 root root 20 Mar 6 23:39 libcudart.so.12 -> libcudart.so.12.4.99
-rwxr-xr-x. 1 root root 707904 Mar 6 23:39 libcudart.so.12.4.99
② 在 /data/XXX/miniconda3/envs/cqr38
中创建 /usr/local/cuda/lib64/lib/cudart.so.12.4.99
的“快捷方式”,即软连接。执行如下命令
ln -s /usr/local/cuda/lib64/libcudart.so.12.4.99 /data/XXX/miniconda3/envs/cqr38/libcudart.so
③ 配置用户(比如XXX,而不是 root)环境变量:
vim ~/.bashrc
编辑环境变量。
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export BNB_CUDA_VERSION=122
source ~/.bashrc
其中的 BNB_CUDA_VERSION
设置为 122,我也不清楚。
然后再执行就不会出现错误了。