RuntimeError: CUDA error: invalid device function
CUDA内核编译时的架构设置与当前GPU不兼容导致
-- The CUDA compiler identification is NVIDIA 11.5.119 (实际为 12.6)
解决方案:
1. 查看显卡计算能力
2. CMakeLists.txt 修改
set_target_properties(my_library PROPERTIES
CUDA_ARCHITECTURES 89 # 关键修复点
)
3. 修改Makefile (添加 -DCMAKE_CUDA_ARCHITECTURES=89,明确指定支持 sm_89
build:
mkdir -p build
cmake -Bbuild -DCMAKE_BUILD_TYPE=$(TYPE) -DUSE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89
make -j -C build
4. 删除老版本 nvcc (旧版本的路径在`/usr/bin`下,而新版本可能在`/usr/local/cuda-12.6/bin)
命令:
# 删除旧 CUDA 11.5 的 nvcc 和工具链(谨慎操作!)
sudo rm /usr/bin/nvcc
# 或者更安全的方式:通过 apt 卸载旧 CUDA 包(如果通过 apt 安装)
sudo apt purge cuda-toolkit-11-5
结果:
-- The CUDA compiler identification is NVIDIA 12.6.85
问题解决,成功完成测试
ubun22:/mnt/c/Users/lms/Desktop/cuda$ /usr/bin/env /usr/bin/python /home/ubun22/.vscode-server/extensions/ms-python.debugpy-2025.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 48006 -- /mnt/c/Users/lms/Desktop/cuda/hpc2torch1/test/gather.py --device cuda
Testing Gather on cuda with x_shape:(3, 2) , indice_shape:(2, 2), axis:0 ,dtype:torch.float32
2025-04-09 20:36:15,539 Pytorch: 0.04505600035190582 ms, kernel: 0.03885599970817566 ms [-13.76%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(3, 2) , indice_shape:(1, 2), axis:1 ,dtype:torch.float32
2025-04-09 20:36:15,547 Pytorch: 0.04580639898777008 ms, kernel: 0.042956799268722534 ms [-6.22%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(3, 2, 3) , indice_shape:(1, 2, 1), axis:1 ,dtype:torch.float32
2025-04-09 20:36:15,552 Pytorch: 0.029182401299476624 ms, kernel: 0.021587200462818146 ms [-26.03%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(50257, 768) , indice_shape:(16, 1024), axis:0 ,dtype:torch.float32
2025-04-09 20:36:15,596 Pytorch: 0.40238080024719236 ms, kernel: 0.4157440185546875 ms [+3.32%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(1024, 512, 32, 4) , indice_shape:(128, 2, 2), axis:1 ,dtype:torch.float32
2025-04-09 20:36:16,877 Pytorch: 1.8170879364013672 ms, kernel: 1.8123775482177735 ms [-0.26%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(3, 2) , indice_shape:(2, 2), axis:0 ,dtype:torch.float16
2025-04-09 20:36:22,590 Pytorch: 0.014347200095653535 ms, kernel: 0.01908479928970337 ms [+33.02%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(3, 2) , indice_shape:(1, 2), axis:1 ,dtype:torch.float16
2025-04-09 20:36:22,593 Pytorch: 0.02247679978609085 ms, kernel: 0.00979039967060089 ms [-56.44%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(50257, 768) , indice_shape:(16, 1024), axis:0 ,dtype:torch.float16
2025-04-09 20:36:22,613 Pytorch: 0.2537472009658813 ms, kernel: 0.20623359680175782 ms [-18.72%]
absolute error:0.0000e+00
relative error:0.0000e+00
Testing Gather on cuda with x_shape:(512, 128, 4, 128) , indice_shape:(4, 1, 1), axis:2 ,dtype:torch.float16
2025-04-09 20:36:23,894 Pytorch: 0.6146048069000244 ms, kernel: 0.6029823780059814 ms [-1.89%]
absolute error:0.0000e+00
relative error:0.0000e+00
ubun22:/mnt/c/Users/lms/Desktop/cuda$