当前位置：首页 > news >正文

Thrust库中,host_vector和device_vector数据之间如何高效传输，有异步传输方式吗？

news 2025/11/1 22:11:22

在 Thrust 库中，host_vector 和 device_vector 之间的数据传输可以通过多种方式实现，包括同步和异步传输。以下是高效传输的方法及异步传输的实现方式：

1. 同步传输

(1) 直接赋值

Thrust 提供了隐式的拷贝机制，直接赋值即可完成数据传输：

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>

thrust::host_vector<int> h_vec = {1, 2, 3};
thrust::device_vector<int> d_vec = h_vec; // 主机到设备（同步）
thrust::host_vector<int> h_vec2 = d_vec; // 设备到主机（同步）

(2) 显式使用 `thrust::copy`

通过 thrust::copy 显式控制拷贝过程：

thrust::copy(h_vec.begin(), h_vec.end(), d_vec.begin()); // 主机→设备
thrust::copy(d_vec.begin(), d_vec.end(), h_vec2.begin()); // 设备→主机

2. 异步传输

Thrust 本身不直接提供异步传输接口，但可以通过 CUDA 流（Stream） 结合 Thrust 的底层指针实现异步传输。步骤如下：

(1) 获取原始指针并调用 CUDA API

#include <cuda_runtime.h>

// 创建 CUDA 流
cudaStream_t stream;
cudaStreamCreate(&stream);

// 获取设备向量的原始指针
int* d_ptr = thrust::raw_pointer_cast(d_vec.data());

// 异步主机→设备传输
cudaMemcpyAsync(
    d_ptr, 
    h_vec.data(), 
    h_vec.size() * sizeof(int), 
    cudaMemcpyHostToDevice, 
    stream
);

// 异步设备→主机传输
cudaMemcpyAsync(
    h_vec2.data(), 
    d_ptr, 
    d_vec.size() * sizeof(int), 
    cudaMemcpyDeviceToHost, 
    stream
);

// 同步流（等待异步操作完成）
cudaStreamSynchronize(stream);
cudaStreamDestroy(stream);

(2) 结合 Thrust 和 CUDA 流

若需在 Thrust 算法中异步执行，可通过指定执行策略（如 thrust::cuda::par.on(stream)）实现：

#include <thrust/execution_policy.h>

// 在流上异步执行 Thrust 算法
thrust::copy(
    thrust::cuda::par.on(stream), 
    h_vec.begin(), 
    h_vec.end(), 
    d_vec.begin()
);

3. 高效传输建议

批量化传输：减少传输次数，合并多次小传输为一次大传输。
页锁定内存（Pinned Memory）：使用 cudaMallocHost 或 thrust::host_vector 分配页锁定主机内存，提升传输带宽。
```
thrust::host_vector<int, thrust::cuda::experimental::pinned_allocator<int>> h_pinned_vec;
```
重叠计算与传输：使用多流（Multi-Stream）并行计算和传输。