第四十七篇-Tesla P40+Qwen3-30B-A3B部署与测试
环境
系统:CentOS-7
CPU: 14C28T
显卡:Tesla P40 24G
驱动: 535
CUDA: 12.2
Ollama
模型 Qwen3-30B-A3B
ollama run qwen3:30b --verbose
显存
Tue May 27 23:50:56 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 Off | 00000000:03:00.0 Off | Off |
| N/A 36C P0 50W / 250W | 19092MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------++---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4168 C /usr/local/bin/ollama 19090MiB |
+---------------------------------------------------------------------------------------+
速度
total duration: 11.091046885s
load duration: 47.799704ms
prompt eval count: 424 token(s)
prompt eval duration: 171.063992ms
prompt eval rate: 2478.60 tokens/s
eval count: 413 token(s)
eval duration: 10.845275175s
eval rate: 38.08 tokens/s
总结
速度还是相当可以的,看来P40还可以在发发余热。