当前位置：首页 > news >正文

TensorRT笔记（3）：解析样例中BufferManager类的设计

news 2025/11/17 11:04:08

在https://blog.csdn.net/ouliten/article/details/154490047?spm=1001.2014.3001.5502#t14

里，我只是提了一下BufferManager，因为整体比较多，专门放在本文来讲。

可以具体对比一下cuda编程笔记（21）-- TensorRT-CSDN博客，这篇文章的推理过程，涉及了GPU内存的申请释放，为了不显示管理GPU内存，所以官方封装了BufferManager类，避免混乱。

本文所有源码在samples/common/buffers.h里都可以找到：TensorRT/samples/common/buffers.h at main · NVIDIA/TensorRT

GenericBuffer


//!
//! \brief  The GenericBuffer class is a templated class for buffers.
//!
//! \details This templated RAII (Resource Acquisition Is Initialization) class handles the allocation,
//!          deallocation, querying of buffers on both the device and the host.
//!          It can handle data of arbitrary types because it stores byte buffers.
//!          The template parameters AllocFunc and FreeFunc are used for the
//!          allocation and deallocation of the buffer.
//!          AllocFunc must be a functor that takes in (void** ptr, size_t size)
//!          and returns bool. ptr is a pointer to where the allocated buffer address should be stored.
//!          size is the amount of memory in bytes to allocate.
//!          The boolean indicates whether or not the memory allocation was successful.
//!          FreeFunc must be a functor that takes in (void* ptr) and returns void.
//!          ptr is the allocated buffer address. It must work with nullptr input.
//!
template <typename AllocFunc, typename FreeFunc>
class GenericBuffer
{
public://!//! \brief Construct an empty buffer.//!GenericBuffer(nvinfer1::DataType type = nvinfer1::DataType::kFLOAT): mSize(0), mCapacity(0), mType(type), mBuffer(nullptr){}//!//! \brief Construct a buffer with the specified allocation size in bytes.//!GenericBuffer(size_t size, nvinfer1::DataType type): mSize(size), mCapacity(size), mType(type){if (!allocFn(&mBuffer, this->nbBytes())){throw std::bad_alloc();}}GenericBuffer(GenericBuffer&& buf): mSize(buf.mSize), mCapacity(buf.mCapacity), mType(buf.mType), mBuffer(buf.mBuffer){buf.mSize = 0;buf.mCapacity = 0;buf.mType = nvinfer1::DataType::kFLOAT;buf.mBuffer = nullptr;}GenericBuffer& operator=(GenericBuffer&& buf){if (this != &buf){freeFn(mBuffer);mSize = buf.mSize;mCapacity = buf.mCapacity;mType = buf.mType;mBuffer = buf.mBuffer;// Reset buf.//移动时记得释放源对象的资源buf.mSize = 0;buf.mCapacity = 0;buf.mBuffer = nullptr;}return *this;}//!//! \brief Returns pointer to underlying array.//!void* data(){return mBuffer;}//!//! \brief Returns pointer to underlying array.//!const void* data() const{return mBuffer;}//!//! \brief Returns the size (in number of elements) of the buffer.//!size_t size() const{return mSize;}//!//! \brief Returns the size (in bytes) of the buffer.//!size_t nbBytes() const{return this->size() * samplesCommon::getElementSize(mType);}//!//! \brief Resizes the buffer. This is a no-op if the new size is smaller than or equal to the current capacity.//!void resize(size_t newSize)//动态扩容逻辑{mSize = newSize;if (mCapacity < newSize){freeFn(mBuffer);if (!allocFn(&mBuffer, this->nbBytes())){throw std::bad_alloc{};}mCapacity = newSize;}}//!//! \brief Overload of resize that accepts Dims//!void resize(const nvinfer1::Dims& dims){return this->resize(samplesCommon::volume(dims));}~GenericBuffer()//RAII 思想的体现，自动释放内存{freeFn(mBuffer);}private:size_t mSize{0}, mCapacity{0};nvinfer1::DataType mType;void* mBuffer;AllocFunc allocFn;FreeFunc freeFn;
};

这个类可以统一管理 Host / Device 的内存，无论是 malloc/free，还是 cudaMalloc/cudaFree，都能统一封装成一个“缓冲区对象”。

为什么叫 GenericBuffer？
因为这个类 不依赖于具体的分配方式，而是通过模板参数决定。

这里的 AllocFunc 和 FreeFunc 是模板参数，不是函数类型，而是函数对象（functor）类型。它们会重载operator()方法，用来自定义内存的分配释放。由于cpu内存和gpu内存的分配释放方式是不同的，所以用这种方式进行屏蔽。（后面会给出cpu，gpu内存分配释放的对应类）

nvinfer1::DataType

数据类型的枚举


//!
//! \enum DataType
//! \brief The type of weights and tensors.
//!
enum class DataType : int32_t
{//! 32-bit floating point format.kFLOAT = 0,//! IEEE 16-bit floating-point format -- has a 5 bit exponent and 11 bit significand.kHALF = 1,//! Signed 8-bit integer representing a quantized floating-point value.kINT8 = 2,//! Signed 32-bit integer format.kINT32 = 3,//! 8-bit boolean. 0 = false, 1 = true, other values undefined.kBOOL = 4,//! Unsigned 8-bit integer format.//! Cannot be used to represent quantized floating-point values.//! Use the IdentityLayer to convert kUINT8 network-level inputs to {kFLOAT, kHALF} prior//! to use with other TensorRT layers, or to convert intermediate output//! before kUINT8 network-level outputs from {kFLOAT, kHALF} to kUINT8.//! kUINT8 conversions are only supported for {kFLOAT, kHALF}.//! kUINT8 to {kFLOAT, kHALF} conversion will convert the integer values//! to equivalent floating point values.//! {kFLOAT, kHALF} to kUINT8 conversion will convert the floating point values//! to integer values by truncating towards zero. This conversion has undefined behavior for//! floating point values outside the range [0.0F, 256.0F) after truncation.//! kUINT8 conversions are not supported for {kINT8, kINT32, kBOOL}.kUINT8 = 5,//! Signed 8-bit floating point with//! 1 sign bit, 4 exponent bits, 3 mantissa bits, and exponent-bias 7.kFP8 = 6,//! Brain float -- has an 8 bit exponent and 8 bit significand.kBF16 = 7,//! Signed 64-bit integer type.kINT64 = 8,//! Signed 4-bit integer type.kINT4 = 9,};

samplesCommon::getElementSize

根据上述枚举变量，返回真实字节大小

//#include "common.h"
inline uint32_t getElementSize(nvinfer1::DataType t) noexcept
{switch (t){case nvinfer1::DataType::kINT64: return 8;case nvinfer1::DataType::kINT32:case nvinfer1::DataType::kFLOAT: return 4;case nvinfer1::DataType::kBF16:case nvinfer1::DataType::kHALF: return 2;case nvinfer1::DataType::kBOOL:case nvinfer1::DataType::kUINT8:case nvinfer1::DataType::kINT8:case nvinfer1::DataType::kFP8: return 1;case nvinfer1::DataType::kINT4:ASSERT(false && "Element size is not implemented for sub-byte data-types");}return 0;
}

自动内存管理类

class DeviceAllocator
{
public:bool operator()(void** ptr, size_t size) const{return cudaMalloc(ptr, size) == cudaSuccess;}
};class DeviceFree
{
public:void operator()(void* ptr) const{cudaFree(ptr);}
};class HostAllocator
{
public:bool operator()(void** ptr, size_t size) const{*ptr = malloc(size);return *ptr != nullptr;}
};class HostFree
{
public:void operator()(void* ptr) const{free(ptr);}
};using DeviceBuffer = GenericBuffer<DeviceAllocator, DeviceFree>;
using HostBuffer = GenericBuffer<HostAllocator, HostFree>;//!
//! \brief  The ManagedBuffer class groups together a pair of corresponding device and host buffers.
//!
class ManagedBuffer
{
public:DeviceBuffer deviceBuffer;HostBuffer hostBuffer;
};

这里就比较好理解了，如果cuda不了解，可以看我写的cuda编程笔记cuda编程笔记（2）--传递参数、设备属性_结构体参数的核函数-CSDN博客

BufferManager

class BufferManager
{
public:static const size_t kINVALID_SIZE_VALUE = ~size_t(0);//!//! \brief Create a BufferManager for handling buffer interactions with engine, when the I/O tensor volumes//! are provided//!BufferManager(std::shared_ptr<nvinfer1::ICudaEngine> engine, std::vector<int64_t> const& volumes, int32_t batchSize = 0): mEngine(engine), mBatchSize(batchSize){// Create host and device buffersfor (int32_t i = 0; i < mEngine->getNbIOTensors(); i++){auto const name = engine->getIOTensorName(i);mNames[name] = i;nvinfer1::DataType type = mEngine->getTensorDataType(name);std::unique_ptr<ManagedBuffer> manBuf{new ManagedBuffer()};manBuf->deviceBuffer = DeviceBuffer(volumes[i], type);manBuf->hostBuffer = HostBuffer(volumes[i], type);void* deviceBuffer = manBuf->deviceBuffer.data();mDeviceBindings.emplace_back(deviceBuffer);mManagedBuffers.emplace_back(std::move(manBuf));}}//!//! \brief Create a BufferManager for handling buffer interactions with engine.//!BufferManager(std::shared_ptr<nvinfer1::ICudaEngine> engine, int32_t const batchSize = 0,nvinfer1::IExecutionContext const* context = nullptr): mEngine(engine), mBatchSize(batchSize){// Create host and device buffersfor (int32_t i = 0, e = mEngine->getNbIOTensors(); i < e; i++){auto const name = engine->getIOTensorName(i);mNames[name] = i;auto dims = context ? context->getTensorShape(name) : mEngine->getTensorShape(name);size_t vol = context || !mBatchSize ? 1 : static_cast<size_t>(mBatchSize);nvinfer1::DataType type = mEngine->getTensorDataType(name);int32_t vecDim = mEngine->getTensorVectorizedDim(name);if (-1 != vecDim) // i.e., 0 != lgScalarsPerVector{int32_t scalarsPerVec = mEngine->getTensorComponentsPerElement(name);dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec);vol *= scalarsPerVec;}vol *= samplesCommon::volume(dims);std::unique_ptr<ManagedBuffer> manBuf{new ManagedBuffer()};manBuf->deviceBuffer = DeviceBuffer(vol, type);manBuf->hostBuffer = HostBuffer(vol, type);void* deviceBuffer = manBuf->deviceBuffer.data();mDeviceBindings.emplace_back(deviceBuffer);mManagedBuffers.emplace_back(std::move(manBuf));}}//!//! \brief Returns a vector of device buffers that you can use directly as//!        bindings for the execute and enqueue methods of IExecutionContext.//!std::vector<void*>& getDeviceBindings(){return mDeviceBindings;}//!//! \brief Returns a vector of device buffers.//!std::vector<void*> const& getDeviceBindings() const{return mDeviceBindings;}//!//! \brief Returns the device buffer corresponding to tensorName.//!        Returns nullptr if no such tensor can be found.//!void* getDeviceBuffer(std::string const& tensorName) const{return getBuffer(false, tensorName);}//!//! \brief Returns the host buffer corresponding to tensorName.//!        Returns nullptr if no such tensor can be found.//!void* getHostBuffer(std::string const& tensorName) const{return getBuffer(true, tensorName);}//!//! \brief Returns the size of the host and device buffers that correspond to tensorName.//!        Returns kINVALID_SIZE_VALUE if no such tensor can be found.//!size_t size(std::string const& tensorName) const{auto record = mNames.find(tensorName);if (record == mNames.end())return kINVALID_SIZE_VALUE;return mManagedBuffers[record->second]->hostBuffer.nbBytes();}//!//! \brief Templated print function that dumps buffers of arbitrary type to std::ostream.//!        rowCount parameter controls how many elements are on each line.//!        A rowCount of 1 means that there is only 1 element on each line.//!template <typename T>void print(std::ostream& os, void* buf, size_t bufSize, size_t rowCount){assert(rowCount != 0);assert(bufSize % sizeof(T) == 0);T* typedBuf = static_cast<T*>(buf);size_t numItems = bufSize / sizeof(T);for (int32_t i = 0; i < static_cast<int>(numItems); i++){// Handle rowCount == 1 caseif (rowCount == 1 && i != static_cast<int>(numItems) - 1)os << typedBuf[i] << std::endl;else if (rowCount == 1)os << typedBuf[i];// Handle rowCount > 1 caseelse if (i % rowCount == 0)os << typedBuf[i];else if (i % rowCount == rowCount - 1)os << " " << typedBuf[i] << std::endl;elseos << " " << typedBuf[i];}}//!//! \brief Copy the contents of input host buffers to input device buffers synchronously.//!void copyInputToDevice(){memcpyBuffers(true, false, false);}//!//! \brief Copy the contents of output device buffers to output host buffers synchronously.//!void copyOutputToHost(){memcpyBuffers(false, true, false);}//!//! \brief Copy the contents of input host buffers to input device buffers asynchronously.//!void copyInputToDeviceAsync(cudaStream_t const& stream = 0){memcpyBuffers(true, false, true, stream);}//!//! \brief Copy the contents of output device buffers to output host buffers asynchronously.//!void copyOutputToHostAsync(cudaStream_t const& stream = 0){memcpyBuffers(false, true, true, stream);}~BufferManager() = default;private:void* getBuffer(bool const isHost, std::string const& tensorName) const{auto record = mNames.find(tensorName);if (record == mNames.end())return nullptr;return (isHost ? mManagedBuffers[record->second]->hostBuffer.data(): mManagedBuffers[record->second]->deviceBuffer.data());}bool tenosrIsInput(const std::string& tensorName) const{return mEngine->getTensorIOMode(tensorName.c_str()) == nvinfer1::TensorIOMode::kINPUT;}void memcpyBuffers(bool const copyInput, bool const deviceToHost, bool const async, cudaStream_t const& stream = 0){for (auto const& n : mNames){void* dstPtr = deviceToHost ? mManagedBuffers[n.second]->hostBuffer.data(): mManagedBuffers[n.second]->deviceBuffer.data();void const* srcPtr = deviceToHost ? mManagedBuffers[n.second]->deviceBuffer.data(): mManagedBuffers[n.second]->hostBuffer.data();size_t const byteSize = mManagedBuffers[n.second]->hostBuffer.nbBytes();const cudaMemcpyKind memcpyType = deviceToHost ? cudaMemcpyDeviceToHost : cudaMemcpyHostToDevice;if ((copyInput && tenosrIsInput(n.first)) || (!copyInput && !tenosrIsInput(n.first))){if (async)CHECK(cudaMemcpyAsync(dstPtr, srcPtr, byteSize, memcpyType, stream));elseCHECK(cudaMemcpy(dstPtr, srcPtr, byteSize, memcpyType));}}}std::shared_ptr<nvinfer1::ICudaEngine> mEngine;              //!< The pointer to the engineint mBatchSize;                                              //!< The batch size for legacy networks, 0 otherwise.std::vector<std::unique_ptr<ManagedBuffer>> mManagedBuffers; //!< The vector of pointers to managed buffersstd::vector<void*> mDeviceBindings;              //!< The vector of device buffers needed for engine executionstd::unordered_map<std::string, int32_t> mNames; //!< The map of tensor name and index pairs
};

成员变量

    std::shared_ptr<nvinfer1::ICudaEngine> mEngine;              //!< The pointer to the engineint mBatchSize;                                              //!< The batch size for legacy networks, 0 otherwise.std::vector<std::unique_ptr<ManagedBuffer>> mManagedBuffers; //!< The vector of pointers to managed buffersstd::vector<void*> mDeviceBindings;              //!< The vector of device buffers needed for engine executionstd::unordered_map<std::string, int32_t> mNames; //!< The map of tensor name and index pairs

BufferManager├── mEngine              // TensorRT 引擎├── mBatchSize           // 批次大小├── mManagedBuffers      // 每个 tensor 的 host/device buffer│     ├── ManagedBuffer│     │     ├── HostBuffer hostBuffer;│     │     └── DeviceBuffer deviceBuffer;├── mDeviceBindings      // 存放所有 device buffer 的指针└── mNames               // tensor 名 -> 索引的映射

每个 tensor 都对应一对host和device内存

这里的tensor并不是中间激活值的tensor，仅仅是输入和输出两个出入口的tensor。

私有成员函数

    //获取对应tensorName的tensor的主机或设备内存void* getBuffer(bool const isHost, std::string const& tensorName) const{auto record = mNames.find(tensorName);if (record == mNames.end())return nullptr;return (isHost ? mManagedBuffers[record->second]->hostBuffer.data(): mManagedBuffers[record->second]->deviceBuffer.data());}//判断该tensorName是否是输入tensorbool tenosrIsInput(const std::string& tensorName) const{return mEngine->getTensorIOMode(tensorName.c_str()) == nvinfer1::TensorIOMode::kINPUT;}//设备<->主机内存之间的相互拷贝。可以选择异步void memcpyBuffers(bool const copyInput, bool const deviceToHost, bool const async, cudaStream_t const& stream = 0){for (auto const& n : mNames){void* dstPtr = deviceToHost ? mManagedBuffers[n.second]->hostBuffer.data(): mManagedBuffers[n.second]->deviceBuffer.data();void const* srcPtr = deviceToHost ? mManagedBuffers[n.second]->deviceBuffer.data(): mManagedBuffers[n.second]->hostBuffer.data();size_t const byteSize = mManagedBuffers[n.second]->hostBuffer.nbBytes();const cudaMemcpyKind memcpyType = deviceToHost ? cudaMemcpyDeviceToHost : cudaMemcpyHostToDevice;if ((copyInput && tenosrIsInput(n.first)) || (!copyInput && !tenosrIsInput(n.first))){if (async)CHECK(cudaMemcpyAsync(dstPtr, srcPtr, byteSize, memcpyType, stream));elseCHECK(cudaMemcpy(dstPtr, srcPtr, byteSize, memcpyType));}}}

构造函数

//!//! \brief Create a BufferManager for handling buffer interactions with engine, when the I/O tensor volumes//! are provided//!BufferManager(std::shared_ptr<nvinfer1::ICudaEngine> engine, std::vector<int64_t> const& volumes, int32_t batchSize = 0): mEngine(engine), mBatchSize(batchSize){// Create host and device buffersfor (int32_t i = 0; i < mEngine->getNbIOTensors(); i++){auto const name = engine->getIOTensorName(i);mNames[name] = i;nvinfer1::DataType type = mEngine->getTensorDataType(name);std::unique_ptr<ManagedBuffer> manBuf{new ManagedBuffer()};manBuf->deviceBuffer = DeviceBuffer(volumes[i], type);manBuf->hostBuffer = HostBuffer(volumes[i], type);void* deviceBuffer = manBuf->deviceBuffer.data();mDeviceBindings.emplace_back(deviceBuffer);mManagedBuffers.emplace_back(std::move(manBuf));}}//!//! \brief Create a BufferManager for handling buffer interactions with engine.//!BufferManager(std::shared_ptr<nvinfer1::ICudaEngine> engine, int32_t const batchSize = 0,nvinfer1::IExecutionContext const* context = nullptr): mEngine(engine), mBatchSize(batchSize){// Create host and device buffersfor (int32_t i = 0, e = mEngine->getNbIOTensors(); i < e; i++){auto const name = engine->getIOTensorName(i);mNames[name] = i;//如果有传入 context（说明是动态 shape 模型），就从执行上下文拿当前真实 shape；//否则（静态 shape）直接用引擎记录的 shape。auto dims = context ? context->getTensorShape(name) : mEngine->getTensorShape(name);//对于动态 shape 或 batchSize 为 0 的模型（比如显式 batch 模式），这里先设为 1；//否则用 batchSize。size_t vol = context || !mBatchSize ? 1 : static_cast<size_t>(mBatchSize);nvinfer1::DataType type = mEngine->getTensorDataType(name);//getTensorVectorizedDim()：查看是否启用了向量化加载（例如 FP16 或 INT8 的 Tensor Core 优化）。int32_t vecDim = mEngine->getTensorVectorizedDim(name);if (-1 != vecDim) //某一维度的张量被“打包”成了向量（例如一组 4 个 FP16）{//shape 的该维度变成「原大小 / 每向量元素个数」//同时总体积再乘回 scalarsPerVec，保证内存体积正确int32_t scalarsPerVec = mEngine->getTensorComponentsPerElement(name);dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec);vol *= scalarsPerVec;}//samplesCommon::volume(dims) 会把所有维度的乘积计算出来，相当于 ∏ dims.d[i]vol *= samplesCommon::volume(dims);std::unique_ptr<ManagedBuffer> manBuf{new ManagedBuffer()};manBuf->deviceBuffer = DeviceBuffer(vol, type);manBuf->hostBuffer = HostBuffer(vol, type);void* deviceBuffer = manBuf->deviceBuffer.data();mDeviceBindings.emplace_back(deviceBuffer);mManagedBuffers.emplace_back(std::move(manBuf));}}

第一个版本用于固定尺寸的张量（volumes 已经已知）。
比如 MNIST 这种简单网络。
第二个版本用于动态 shape 或带 batch的网络，
需要通过 IExecutionContext::getTensorShape() 查询每个 tensor 的实际维度。

两者最后都干同一件事：
根据 shape & DataType 计算总字节数，然后分配 host/device buffer。

向量化vectorized

在 TensorRT 里，模型优化过程中可能会对张量的某个维度做 向量化访问（vectorized access）。

例如：

如果权重或激活是 FP16 或 INT8 类型；
TensorRT 会自动尝试用更宽的 load/store 指令（如 float4、half2、int4）；
这样能提升带宽利用率，减少访存指令数量；
向量化通常发生在最后一个维度（通道维度 C），但也可能在别的维度上。

这个float4可以参考cuda编程笔记（10）--memory access 优化_coalesced memory access-CSDN博客

TensorRT 的张量维度取决于网络的格式（format），主要有两种常见布局：

格式	含义	维度顺序	举例（图像）
`kLINEAR`（默认）	普通线性布局	`[N, C, H, W]`	batch、通道、高、宽
`kHWC8`, `kCHW32`, `kCHW4` 等	向量化布局	最后一维被“打包”为 8/16/32 等	C通道维被展开到额外维度

Tensor shape	Layout	向量化维度返回值	含义
`[N, C, H, W]`，未向量化	`kLINEAR`	`-1`	普通 float
`[N, C/4, H, W, 4]`	`kCHW4`	`1`	C 维被打包成 4
`[N, H, W, C/8, 8]`	`kHWC8`	`3`	最后一维打包
`[N, C/32, H, W, 32]`	`kCHW32`	`1`	通道维 vectorized by 32

设你的 tensor 是 [1, 128, 224, 224]，每个元素是 FP16（2 字节）。

TensorRT 可能会：

把每 2 个通道合并成一个向量单元（half2）；
新的 shape 逻辑上变为 [1, 64, 224, 224, 2]；
其中 vecDim = 1（通道维），scalarsPerVec = 2。

但是我们读取Tensor的时候，读到的维度依然是返回的是逻辑形状（未向量化的 shape）

概念	含义	示例
逻辑形状 (logical shape)	你在网络层面看到的张量维度（模型原始定义的形状）	`[1, 64, 7, 7]`
物理形状 (physical shape)	实际内存中存储的数据排布方式（经过向量化/对齐）	`[1, 16, 7, 7]` （每个元素是 `vec4`，含 4 个标量）

所以在dims变量上，我们原本读入的是[1,64,7,7],为了匹配实际内存的排布，我们需要手动对dims的维度进行修改

getTensorVectorizedDim

int32_t ICudaEngine::getTensorVectorizedDim(char const* tensorName) const noexcept;

参数： tensor 名；
返回值：
- -1 → 没有向量化；
- >=0 → 向量化所在的维度索引（从 0 开始）。

getTensorComponentsPerElement

int32_t ICudaEngine::getTensorComponentsPerElement(char const* tensorName) const noexcept;

返回值：
- -1 → 没有向量化；
- >=0 → 向量化时一个向量包含的元素个数。比如float4就是包含4个float

divUp

template <typename A, typename B>
inline A divUp(A x, B n)
{return (x + n - 1) / n;
}

简单的整除

samplesCommon::volume

inline int64_t volume(nvinfer1::Dims const& d)
{return std::accumulate(d.d, d.d + d.nbDims, int64_t{1}, std::multiplies<int64_t>{});
}

对维度做累乘

函数处理的逻辑

在构造函数里

auto dims = context ? context->getTensorShape(name) : mEngine->getTensorShape(name);
size_t vol = context || !mBatchSize ? 1 : static_cast<size_t>(mBatchSize);
nvinfer1::DataType type = mEngine->getTensorDataType(name);
int32_t vecDim = mEngine->getTensorVectorizedDim(name);
if (-1 != vecDim) // i.e., 0 != lgScalarsPerVector{int32_t scalarsPerVec = mEngine->getTensorComponentsPerElement(name);dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec);vol *= scalarsPerVec;}
vol *= samplesCommon::volume(dims);

这一段是对向量化的处理

1️⃣ 获取张量形状

如果提供了 context（执行上下文），就从上下文获取当前张量 shape。
这可以处理动态 shape 或绑定动态 batch 的情况。
否则就用 engine 的张量 shape（模型原始定义的逻辑 shape）。
得到的 dims 是逻辑形状，比如 [1, 64, 7, 7]。

这里的mBatchSize逻辑我其实没太看懂。如果context不为空或者mBatchSize=0，就用1，这没问题。但是还可以手动设置这是我没理解的地方，难道mEngine得到的dims不包含batch维度吗？

2️⃣ 初始化体积（vol）

如果有上下文或 batchSize = 0（非 legacy 网络），先把 vol 置为 1。
否则使用 batchSize 作为初始体积。
这里 vol 用来累计最终 buffer 的总元素数（可能会再乘向量化因子）。

3️⃣ 获取数据类型

4️⃣ 检查是否向量化

vecDim：TensorRT 指出哪一维被向量化（例如通道维度），-1 表示未向量化。
scalarsPerVec：每个向量包含多少标量元素（例如 FP16 vec4 → 4 个标量）。
dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec)
- 调整向量化维度的大小，用“向上取整”的方式计算实际需要的向量数。
- 示例：逻辑通道 64，vec4 → divUp(64, 4) = 16 个向量。
vol *= scalarsPerVec：乘回每个向量包含的标量总数，得到物理 buffer 总元素数。

vol *= scalarsPerVec是因为我们是先把维度做了除法divUp，然后再将维度累乘samplesCommon::volume(dims)。如果这里不乘scalarsPerVec，那就相当于少算了元素

5️⃣ 计算总元素数

samplesCommon::volume(dims) 计算张量的体积（dims 所有维度的乘积）。
最终 vol 是 buffer 中标量总数（用于分配 host / device buffer）。

getDeviceBindings

//!//! \brief Returns a vector of device buffers that you can use directly as//!        bindings for the execute and enqueue methods of IExecutionContext.//!std::vector<void*>& getDeviceBindings(){return mDeviceBindings;}//!//! \brief Returns a vector of device buffers.//!std::vector<void*> const& getDeviceBindings() const{return mDeviceBindings;}

这两就没啥好说的了，返回推理所需要是设备内存，其实也就是输入输出Tensor对应的设备内存

但是实际上，这个接口只适用IExecutionContext的executeV2接口，在更新的版本的推理接口用不上这个了，详情参考：https://blog.csdn.net/ouliten/article/details/151794803?spm=1001.2014.3001.5502#t21

更多的是使用getDeviceBuffer获取对应的设备内存即可

查看全文

http://www.dtcms.com/a/619065.html