当前位置: 首页 > news >正文

【大模型报错解决】cublasLt ran into an error!

错误描述

我是在运行llama2模型,做多片段阅读理解任务时遇到的这一报错,用的GPU是H20-NVLink(显存96GB),完整的报错如下所示:

trainable params: 4,194,304 || all params: 6,742,618,112 || trainable%: 0.06220586618327525
Training Epoch: 1:   0%|                                                                                                                                                  | 0/2615 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.warnings.warn(
/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantizationwarnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
cuBLAS API failed with status 15
A: torch.Size([658, 4096]), B: torch.Size([4096, 4096]), C: (658, 4096); (lda, ldb, ldc): (c_int(21056), c_int(131072), c_int(21056)); (m, n, k): (c_int(658), c_int(4096), c_int(4096))
error detectedTraceback (most recent call last):File "/root/QASE-main/train/train_qase.py", line 63, in <module>model.train_model(File "/root/QASE-main/train/../qase.py", line 183, in train_modeloutput_dict = self(batch)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/QASE-main/train/../qase.py", line 88, in forwardoutputs = self.model(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/peft/peft_model.py", line 918, in forwardreturn self.base_model(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forwardreturn self.model.forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forwardoutputs = self.model(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 570, in forwardlayer_outputs = torch.utils.checkpoint.checkpoint(File "/root/miniconda3/lib/python3.10/site-packages/torch/_compile.py", line 24, in innerreturn torch._dynamo.disable(fn, recursive)(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fnreturn fn(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in innerreturn fn(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpointreturn CheckpointFunction.apply(function, preserve, *args)File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in applyreturn super().apply(*args, **kwargs)  # type: ignore[misc]File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forwardoutputs = run_function(*args)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 566, in custom_forwardreturn module(*inputs, output_attentions, None)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forwardhidden_states, self_attn_weights, present_key_value = self.self_attn(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forwardquery_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/peft/tuners/lora.py", line 1149, in forwardresult = super().forward(x)File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forwardout = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmulreturn MatMul8bitLt.apply(A, B, out, bias, state)File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in applyreturn super().apply(*args, **kwargs)  # type: ignore[misc]File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forwardout32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1410, in igemmltraise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!

报错原因

在网上看了好多教程,有说是因为bitsandbytes库安装的版本太旧了,好,我更新一下,pip install --upgrade bitsandbytes,虽然不报这个错了,但是又出现了新的错误(忘记记录了),在搜一下发现是bitsandbytes版本太新导致的,晕,又把版本将回来,结果等于白干……

还看到有说“UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization site:github.com”这个警告是因为大模型在8bit量化训练时出现类型不匹配的问题导致的,然后照着教程对代码进行了一番魔改,最后发现原来的代码就是对的,没必要改……

还有说torch版本和cuda不匹配的,然后我去看了torch是2.1.0,cuda是12.1,是匹配的呀

还有说显存不够的,需要换卡,可是显存96GB已经够大了吧,autodl上貌似也没有更大的了,无解

最后,终于找到一条有用的信息:说只有3090和A100支持8bit,其他的都不行,然后我换成A40(因为A100没有空闲机位了),没报错了。

综上所述,我分析报错的原因是:用的GPU不支持8bit,所以需要换卡。

解决办法

如前所述,解决办法就是——把H20-NVLink换成A40,从96GB的高显存、高成本卡 换成 40G显存的低价卡,结果就完全无报错,顺利的运行成功了,或许有时候真的是大道至简吧……

最后,希望上述内容对大家有所帮助~~~

相关文章:

  • CSS定位详解:掌握布局的核心技术
  • Panasonic Programming Contest 2025(AtCoder Beginner Contest 406)D-E 题解
  • 【Qt开发】进度条ProgressBar和日历Calendar Widget
  • 第十节第九部分:jdk8新特性:方法引用、特定类型的方法引用、构造器引用(不要求代码编写后同步简化代码,后期偶然发现能用这些知识简化即可)
  • Java中的String的常用方法用法总结
  • 【Java项目测试报告】:在线聊天平台(Online-Chat)
  • 2025年渗透测试面试题总结-匿名[社招]前端安全(题目+回答)
  • windows10重装ssh无法下载
  • 大模型推理 memory bandwidth bound (5) - Medusa
  • No such file or directory: ‘ffprobe‘
  • MongoDB 数据库迁移:完整指南与最佳实践
  • 行为型:模板方法模式
  • Linux--环境的搭建(云服务器)
  • 二建考试《专业工程管理与实务》科目包含哪些专业?
  • 52页 @《人工智能生命体 新启点》中國龍 原创连载
  • C++系统IO
  • C++学习之STL学习:string类使用
  • 《深入Python:新手易踩的语法雷区与进阶启示》
  • 再谈Linux 进程:进程等待、进程替换与环境变量
  • 【Node.js】高级主题
  • wordpress导航站/免费外链网
  • 成都健康网官微疫情/seo课程总结怎么写
  • 邯郸有学做搭建网站的吗/大连百度推广公司
  • 进入网络管理的网站/微商推广哪家好
  • 深圳企业网站建设专业/万网官网域名注册
  • 镇江网站建设找思创/5118站长工具