当前位置：首页 > news >正文

llama.cpp无法使用gpu的问题

news 2025/7/19 7:56:37

使用cuda编译llama.cpp后，仍然无法使用gpu。

./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf -ngl 40

报错如下

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions

使用nvidia-smi

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.144

重启即可解决问题

./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf -ngl 40
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
...

load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors: CUDA0 model buffer size = 4076.43 MiB
load_tensors: CPU_Mapped model buffer size = 394.12 MiB

查看全文

http://www.dtcms.com/a/182519.html