llama.cpp无法使用gpu的问题
使用cuda编译llama.cpp后,仍然无法使用gpu。
./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf -ngl 40
报错如下
ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions
使用nvidia-smi
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.144
重启即可解决问题
./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf -ngl 40
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
...
load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors: CUDA0 model buffer size = 4076.43 MiB
load_tensors: CPU_Mapped model buffer size = 394.12 MiB