当前位置：首页 > news >正文

调试bug记录

news 2025/7/17 8:55:01

文章目录

代码调试bug记录
- 第一次计算
- 第二次计算

代码调试bug记录

这应该是踩到什么大坑了…
之前没有怎么深度调试过LLM，这次本来想做一个可解释性实验，想着能不能看看输入到LLM中的token每个贡献度如何，就采用了积分梯度计算。

第一次计算

我的输入token是包括四个部分：

BLIP token
SlowFast token
Swin3D token
Text token

最后输出的就是这四个部分token的梯度贡献：

BLIP: -0.0053
Swin3D: 0.1027
SlowFast: 0.3611

但我想能不能具体看到每个token对结果的贡献度

第二次计算

我的输出相当于每个token，输出的格式是一个类似表格的结果

--- 每个Token的详细贡献度 ---type      token  contribution
0     Text          T      0.001234
1     Text          he     0.002345
...
8     Text          :      0.000123
9     BLIP    Token_0     -0.000567
10    BLIP    Token_1      0.001789
...
16    BLIP    Token_7     -0.000987
17    Text          .      0.000012
...

但是跑出来发现所有的contri都是NaN…
然后问大模型说是可能FP16溢出了，可以试试用FP32，然后我就照做，发现肯定是不可以的，因为一定会爆显存。
当我改回来的时候发现：

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:python -m bitsandbytesInspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues

好像是cuda出问题了…
看看后续怎么解决吧，明天打算修复一下环境试试。

查看全文

http://www.dtcms.com/a/282888.html