第四十五篇-Tesla P40+QWQ-32B部署与测试
环境
系统:CentOS-7
CPU: 14C28T
显卡:Tesla P40 24G
驱动: 515
CUDA: 11.7
cuDNN: 8.9.2.26
Ollama
ollama run qwq:32b --verbose
显存
Fri Mar 7 21:26:43 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 Off | 00000000:03:00.0 Off | 0 |
| N/A 41C P0 176W / 250W | 21446MiB / 23040MiB | 95% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2921 C ...unners/cuda_v12/ollama_llama_server 21444MiB |
+---------------------------------------------------------------------------------------+
速度
total duration: 14.132483694s
load duration: 46.562043ms
prompt eval count: 28 token(s)
prompt eval duration: 293ms
prompt eval rate: 95.56 tokens/s
eval count: 131 token(s)
eval duration: 13.791s
eval rate: 9.50 tokens/s
总结
速度还是可以的,一般问题都OK的,其他性能还要再测测