【大模型报错解决】cublasLt ran into an error!
错误描述
我是在运行llama2模型,做多片段阅读理解任务时遇到的这一报错,用的GPU是H20-NVLink(显存96GB),完整的报错如下所示:
trainable params: 4,194,304 || all params: 6,742,618,112 || trainable%: 0.06220586618327525
Training Epoch: 1: 0%| | 0/2615 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.warnings.warn(
/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantizationwarnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
cuBLAS API failed with status 15
A: torch.Size([658, 4096]), B: torch.Size([4096, 4096]), C: (658, 4096); (lda, ldb, ldc): (c_int(21056), c_int(131072), c_int(21056)); (m, n, k): (c_int(658), c_int(4096), c_int(4096))
error detectedTraceback (most recent call last):File "/root/QASE-main/train/train_qase.py", line 63, in <module>model.train_model(File "/root/QASE-main/train/../qase.py", line 183, in train_modeloutput_dict = self(batch)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/QASE-main/train/../qase.py", line 88, in forwardoutputs = self.model(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/peft/peft_model.py", line 918, in forwardreturn self.base_model(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forwardreturn self.model.forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forwardoutputs = self.model(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 570, in forwardlayer_outputs = torch.utils.checkpoint.checkpoint(File "/root/miniconda3/lib/python3.10/site-packages/torch/_compile.py", line 24, in innerreturn torch._dynamo.disable(fn, recursive)(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fnreturn fn(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in innerreturn fn(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpointreturn CheckpointFunction.apply(function, preserve, *args)File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in applyreturn super().apply(*args, **kwargs) # type: ignore[misc]File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forwardoutputs = run_function(*args)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 566, in custom_forwardreturn module(*inputs, output_attentions, None)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forwardhidden_states, self_attn_weights, present_key_value = self.self_attn(File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forwardoutput = old_forward(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forwardquery_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/miniconda3/lib/python3.10/site-packages/peft/tuners/lora.py", line 1149, in forwardresult = super().forward(x)File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forwardout = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmulreturn MatMul8bitLt.apply(A, B, out, bias, state)File "/root/miniconda3/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in applyreturn super().apply(*args, **kwargs) # type: ignore[misc]File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forwardout32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)File "/root/miniconda3/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1410, in igemmltraise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
报错原因
在网上看了好多教程,有说是因为bitsandbytes库安装的版本太旧了,好,我更新一下,pip install --upgrade bitsandbytes,虽然不报这个错了,但是又出现了新的错误(忘记记录了),在搜一下发现是bitsandbytes版本太新导致的,晕,又把版本将回来,结果等于白干……
还看到有说“UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization site:github.com”这个警告是因为大模型在8bit量化训练时出现类型不匹配的问题导致的,然后照着教程对代码进行了一番魔改,最后发现原来的代码就是对的,没必要改……
还有说torch版本和cuda不匹配的,然后我去看了torch是2.1.0,cuda是12.1,是匹配的呀
还有说显存不够的,需要换卡,可是显存96GB已经够大了吧,autodl上貌似也没有更大的了,无解
最后,终于找到一条有用的信息:说只有3090和A100支持8bit,其他的都不行,然后我换成A40(因为A100没有空闲机位了),没报错了。
综上所述,我分析报错的原因是:用的GPU不支持8bit,所以需要换卡。
解决办法
如前所述,解决办法就是——把H20-NVLink换成A40,从96GB的高显存、高成本卡 换成 40G显存的低价卡,结果就完全无报错,顺利的运行成功了,或许有时候真的是大道至简吧……
最后,希望上述内容对大家有所帮助~~~