第四十五篇-Tesla P40关闭GPU的ECC释放部分显存
环境
系统:CentOS-7
CPU: 14C28T
显卡:Tesla P40 24G
驱动: 515
CUDA: 11.7
cuDNN: 8.9.2.26
开启关闭ECC优缺点
Tesla系列GPU默认开启了ECC(error correcing code,错误检查和纠正)
开启ECC 提升数据可靠性,可用内存的减少和性能上的损失。
关闭ECC,释放完整显存,性能优化,数据错误风险增加
Tesla P40 查看关闭之前
查看ECC命令
nvidia-smi -q -d ecc
开启ECC命令
nvidia-smi -i 0 -e 1
Timestamp : Tue Mar 11 20:59:38 2025
Driver Version : 535.129.03
CUDA Version : 12.2
Attached GPUs : 1
GPU 00000000:03:00.0
ECC Mode
Current : Enabled
Pending : Enabled
|=========================================+======================+======================|
| 0 Tesla P40 Off | 00000000:03:00.0 Off | 0 |
| N/A 14C P8 8W / 250W | 2MiB / 23040MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
Tesla P40 查看关闭之后
关闭ECC命令
nvidia-smi -i 0 -e 0
GPU 00000000:03:00.0
ECC Mode
Current : Disabled
Pending : Disabled
|=========================================+======================+======================|
| 0 Tesla P40 Off | 00000000:03:00.0 Off | Off |
| N/A 15C P8 8W / 250W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
结论
试了试,显存多出1.5G左右。建议开启ECC,GPU可靠性高