当前位置：首页 > wzjs >正文

网站优化软件排名技术百度网站排名优化软件

wzjs 2025/8/7 4:31:35

网站优化软件排名技术,百度网站排名优化软件,wordpress论坛主题模板,网络小说网站推广策划方案目录前言 1. 结果对比 1.1 时间对比 1.2 CPU和NPU占用对比 2. RGA实现YOLO前处理 2.1 实现思路 2.2 处理类的声明 2.3 处理类的实现总结前言 RK平台上有RGA (Raster Graphic Acceleration Unit) 加速，使用RGA可以减少资源占用、加速图片处理速度。因此…

前言

1. 结果对比

1.1 时间对比

1.2 CPU和NPU占用对比

2. RGA实现YOLO前处理

2.1 实现思路

2.2 处理类的声明

2.3 处理类的实现

总结

前言

RK平台上有RGA (Raster Graphic Acceleration Unit) 加速，使用RGA可以减少资源占用、加速图片处理速度。因此，在部署YOLOv8是针对RGA和OpenCV的分别进行了实现，并对性能、速度和资源占用进行对比。

1. 结果对比

1.1 时间对比

总共跑100次计算平均时间。

纯OpenCV实现：

OPencv实现resize+pad
[Convert] Step1: Check input pointer => 0 us
[Convert] Step2: define intermediate Mat => 37 us
[Convert] Step3: cv::resize => 9564 us
[Convert] Step4: create pad_img => 1629 us
[Convert] Step5: compute position => 61 us
[Convert] Step6: copyTo => 340 us
[Convert] Step7: return => 54 us
INFO: image resize time 12.15 ms
INFO: total infer time 22.71 ms: model time is 22.45 ms and postprocess time is 0.26ms
Iteration 100 - time: 35.034000 ms
Total execution time: 3770.930000 ms
Average execution time: 37.709300 ms

纯RGA实现：

RGA实现
[Convert] Step1: Check input pointer => 1 us
[Convert] Step2: Set format/bpp => 92 us
[Convert] Step3: Calculate buffer sizes => 13 us
[Convert] Step4: Define variables => 12 us
[Convert] Step5: Compute border => 13 us
[Convert] Step6: Alloc & memcpy src => 10048 us
[Convert] Step7: Alloc resized buffer => 477 us
[Convert] Step8: Alloc dst buffer => 269 us
[Convert] Step9: importbuffer_fd => 3494 us
[Convert] Step10: wrapbuffer_handle => 80 us
[Convert] Step11: imresize => 2714 us
[Convert] Step12: immakeBorder => 1154 us
[Convert] Step13: copy result => 428 us
[Convert] Step14: cleanup => 2607 us
INFO: image resize time 24.26 ms
INFO: total infer time 22.10 ms: model time is 21.84 ms and postprocess time is 0.26ms
Iteration 100 - time: 46.496000 ms
Total execution time: 4398.143000 ms
Average execution time: 43.981430 ms

总结：

（1）从上可以看到OpenCV最占时间的是resize步骤，需要9~10ms，而RGA只要2~3ms。

（2）但使用RGA，如果图片数据不在DMA缓冲区，则需要进行拷贝，导致耗时太久。

（3）最终，导致RGA实现resize要比OpenCV慢了6~7ms。

1.2 CPU和NPU占用对比

跑单个模型持续推理。

纯OpenCV实现：

CPU占用率：120%~140%

NPU占用率：43%~48%

纯RGA实现：

CPU占用率：50%~60%

NPU占用率：35%~40%

总结：

（1）OPenCV使用CPU多线程计算差值，导致CPU占用率较高。

（2）RGA在DMA缓冲区使用硬件计算，减少对CPU依赖。

（3）RGA比OpenCV减少了60%的CPU占用和10%的NPU占用。

2. RGA实现YOLO前处理

参考代码：

[1] https://github.com/airockchip/librga

[2] rga_resize_demo.cpp

[3] rga_padding_demo.cpp

2.1 实现思路

在DMA缓冲区，每有一个形状，便需要一个CPU指针、句柄和缓冲区，导致分配内存、处理起来极其麻烦，且宽度需要16倍对齐（高度没有要求）、同时还存在着4G范围寻址问题。所以，实际应用要起来还是很麻烦的（尤其是自定义的任意尺寸输入）。

所以，这里仅使用RGA对合规的图片做resize，再使用OpenCV做pad填充。

由于工资预算有限，这里仅实现摄像头输入的原图尺寸（宽度满足16倍长，主要以1920×1080为主）的图片的前处理。

补充（实现任意尺寸处理的思路）：

(1) 设传入的宽和高为：[H，W]，目标宽高（模型输入大小）为：[H_T, W_T]，INT_UP表示向上取整函数。
(2) 原16倍宽度 W_16 为: INT_UP(W / 16) * 16，则现将原图先填充（仅右边填充）到尺寸[H, W_16]。
(3) 计算放缩比 R = min(H / H_T, W_16 / W_t)。
(4) 继续计算得到目标放缩尺寸宽度的16倍 W_T_16 为：INT_UP(W_T_16 / 16) * 16。
(5) 然后更新放缩比 R = min(H / H_T, W_T_16 / R / W_T)，此过程可能需要迭代。
(6) 得到放缩比 R 后，计算原图真正要填充到的尺寸为：[H_T / R, W_T_16 / R]。
(7) 然后对原图仅做右边和下面的填充，这样就把原图对齐到16倍长（放缩前后均是）。

2.2 处理类的声明

#include <iostream>
#include <memory>
#include <numeric>
#include <vector>
#include <algorithm>
#include "opencv2/opencv.hpp"
#include "common.h"
// 增加RGA库实现pad resize
#include <rga/RgaApi.h>
#include <rga/im2d.hpp>
#include <rga/rga.h>
#include <rga/RgaUtils.h>
#include <dma/dma_alloc.h>
// 打印时间
#include <sys/time.h>
#include <stdint.h>
#include <stdio.h>// RGA 版本的 Pad Resize 处理类
class ImagePreProcessRGA {public:// 构造函数：输入图像为 width x height，目标尺寸为正方形 target_size x target_sizeImagePreProcessRGA(int width, int height, int target_size);// 构造函数：输入图像为 width x height，目标尺寸为 target_width x target_heightImagePreProcessRGA(int width, int height, int target_width, int target_height);// 对输入图像数据进行pad resize处理，返回处理后的图像数据（unique_ptr管理）std::unique_ptr<uint8_t[]> Convert(const det_model_input& input);// 获取letterbox信息const letterbox_t &get_letter_box() { return letterbox_; }private:double scale_;  // 缩放比例int input_width_, input_height_;            // 输入图像尺寸int real_input_width_, real_input_height_;  // 输入的实际尺寸int target_width_, target_height_;          // 目标图像尺寸int new_width_, new_height_;                // 缩放后图像的尺寸（填输入的尺寸缩放后，经过填充才能变成目标尺寸）int padding_x_, padding_y_;                 // pad的总尺寸（左右、上下）letterbox_t letterbox_;     // letterbox信息记录缩放比例及左右/上下填充（一般为居中填充）
};

2.3 处理类的实现

实现RGB的3通道或者4通道的图片。

// 构造函数1：只传一个target_size，默认目标是正方形
ImagePreProcessRGA::ImagePreProcessRGA(int width, int height, int target_size): input_width_(width), input_height_(height), target_width_(target_size), target_height_(target_size){// ------------------【Step1：根据最大边计算放缩比例】---------------------// 如果原图是 (width x height)，目标是 (target_size x target_size)，则 scale = target_size / max(width, height)scale_ = static_cast<double>(target_size) / std::max(input_width_, input_height_);// ------------------【Step2：计算缩放后尺寸】----------------------------new_width_  = static_cast<int>(input_width_  * scale_);new_height_ = static_cast<int>(input_height_ * scale_);// ------------------【Step3：计算在目标图像中的剩余填充】------------------padding_x_ = target_size - new_width_;padding_y_ = target_size - new_height_;// ------------------【Step4：更新 letterbox】--------------------------letterbox_.scale = scale_;letterbox_.x_pad = padding_x_ / 2;letterbox_.y_pad = padding_y_ / 2;// ------------------【可选：打印结果】-----------------------------------// printf(">>> After => new_width_=%d, new_height_=%d, scale=%.3f\n", new_width_, new_height_, scale_);// printf(">>> padding_x_=%d, padding_y_=%d\n", padding_x_, padding_y_);
}// 构造函数2：传入独立的target_width和target_height，可能目标不是正方形
ImagePreProcessRGA::ImagePreProcessRGA(int width, int height, int target_width, int target_height): input_width_(width), input_height_(height), target_width_(target_width), target_height_(target_height){// ------------------【Step1：分别计算宽高缩放比例】-----------------------double width_scale  = static_cast<double>(target_width_)  / input_width_;double height_scale = static_cast<double>(target_height_) / input_height_;// 取较小的缩放比例scale_ = std::min(width_scale, height_scale);// ------------------【Step2：计算缩放后尺寸】----------------------------new_width_  = static_cast<int>(input_width_  * scale_);new_height_ = static_cast<int>(input_height_ * scale_);// ------------------【Step3：计算填充大小】------------------------------padding_x_ = target_width_  - new_width_;padding_y_ = target_height_ - new_height_;// ------------------【Step4：更新 letterbox】---------------------------letterbox_.scale = scale_;letterbox_.x_pad = padding_x_ / 2;letterbox_.y_pad = padding_y_ / 2;// ------------------【可选：打印结果】-----------------------------------// printf(">>> After => new_width_=%d, new_height_=%d, scale=%.3f\n", new_width_, new_height_, scale_);// printf(">>> padding_x_=%d, padding_y_=%d\n", padding_x_, padding_y_);
}// 核心函数：基于RGA对输入图像进行pad resize，并返回处理后的图像数据
std::unique_ptr<uint8_t[]> ImagePreProcessRGA::Convert(const det_model_input& input){// -------------------【Step0：在函数开头声明所有变量】-------------------// 中间处理函数返回值int ret = 0;// DMA fdint src_dma_fd     = -1;int resized_dma_fd = -1;// CPU指针uint8_t *src_buf     = nullptr;uint8_t *resized_buf = nullptr;// RGA handlerga_buffer_handle_t src_handle     = 0;rga_buffer_handle_t resized_handle = 0;// RGA bufferrga_buffer_t rga_src;rga_buffer_t rga_resized;memset(&rga_src, 0, sizeof(rga_src));memset(&rga_resized, 0, sizeof(rga_resized));// 最终的返回结果std::unique_ptr<uint8_t[]> final_data;// 其他局部变量int bpp_src    = 0;int bpp_dst    = 3;  // 目标一定3通道int src_format = 0;int dst_format = RK_FORMAT_RGB_888;// 源图大小和resize后大小int src_size     = 0;  int resized_size = 0;  // 用于 pad 的边界int left=0, right=0, top=0, bottom=0;// 常用114作为灰度cv::Scalar pad_color(114,114,114);// -------------------【Step1：基础检查】------------------------------------if (!input.data || input.width <= 0 || input.height <= 0 || (input.channel != 3 && input.channel != 4)){fprintf(stderr, "ERROR: invalid input data or channel.\n");return nullptr;}// 根据通道数决定 bpp & formatbpp_src    = (input.channel == 4) ? 4 : 3;src_format = (input.channel == 4) ? RK_FORMAT_RGBA_8888 : RK_FORMAT_RGB_888;// 源图大小src_size   = input.width  * input.height  * bpp_src;// resize后大小int out_size_w = new_width_;   // 由构造函数算好int out_size_h = new_height_;  // 由构造函数算好resized_size = out_size_w * out_size_h * bpp_dst;std::unique_ptr<uint8_t[]> resized_cpu(new uint8_t[resized_size]);// 用于后面 OpenCV padleft   = padding_x_ / 2;right  = padding_x_ - left;top    = padding_y_ / 2;bottom = padding_y_ - top;// -------------------【Step2：分配 src_buf, resized_buf】-------------------ret = dma_buf_alloc(DMA_HEAP_DMA32_PATH, src_size, &src_dma_fd, (void**)&src_buf);if (ret < 0 || !src_buf) {fprintf(stderr, "ERROR: alloc src_buf failed.\n");return nullptr;}ret = dma_buf_alloc(DMA_HEAP_DMA32_PATH, resized_size, &resized_dma_fd, (void**)&resized_buf);if (ret < 0 || !resized_buf) {fprintf(stderr, "ERROR: alloc resized_buf failed.\n");goto cleanup;}// 不考虑16对齐等，只做最原始的 YOLO思路。只需将 input.data 拷贝到 src_bufmemcpy(src_buf, input.data, src_size);// -------------------【Step3：import & wrap】-------------------------------src_handle     = importbuffer_fd(src_dma_fd,     src_size);resized_handle = importbuffer_fd(resized_dma_fd, resized_size);if (!src_handle || !resized_handle) {fprintf(stderr, "ERROR: importbuffer_fd failed.\n");ret = -1;goto cleanup;}rga_src     = wrapbuffer_handle(src_handle,     input.width,  input.height,  src_format);rga_resized = wrapbuffer_handle(resized_handle, out_size_w,   out_size_h,    dst_format);// -------------------【Step4：RGA仅做 resize or color convert+resize】-------------------if (input.channel == 4) {// RGBA => color convert => resizedIM_STATUS st_cvt = imcvtcolor(rga_src, rga_resized, RK_FORMAT_RGBA_8888, RK_FORMAT_RGB_888);if (st_cvt != IM_STATUS_SUCCESS) {fprintf(stderr, "ERROR: imcvtcolor failed: %s.\n", imStrError(st_cvt));ret = -1;goto cleanup;}}else {// channel=3 => 直接 resizeIM_STATUS st_resize = imresize(rga_src, rga_resized, 0, 0, INTER_LINEAR);if (st_resize != IM_STATUS_SUCCESS) {fprintf(stderr, "ERROR: imresize failed: %s.\n", imStrError(st_resize));ret = -1;goto cleanup;}}// -------------------【Step5：将 resized_buf 拷回 CPU】-------------------// 拿到 resize 后的 RGB 图像数据memcpy(resized_cpu.get(), resized_buf, resized_size);// -------------------【Step6：用OpenCV进行 pad】-------------------{// 1) 构造一个 cv::Mat 指向 resized_cpucv::Mat resized_mat(new_height_, new_width_, CV_8UC3, resized_cpu.get());// 2) 构造一个 pad_mat (target_height_ x target_width_)，初始颜色(114,114,114)cv::Mat pad_mat(target_height_, target_width_, CV_8UC3, pad_color);// 3) 计算在 pad_mat 中的放置位置// left=padding_x_/2, top=padding_y_/2cv::Rect roi(left, top, resized_mat.cols, resized_mat.rows);// 4) 拷贝 resized_mat 到 pad_mat 对应区域resized_mat.copyTo(pad_mat(roi));// 5) 将 pad_mat 拷到 final_dataint final_size = target_width_ * target_height_ * 3; // 3通道final_data.reset(new uint8_t[final_size]);memcpy(final_data.get(), pad_mat.data, final_size);}cleanup:// -------------------【Step7：释放资源】-------------------if (src_handle)     releasebuffer_handle(src_handle);if (resized_handle) releasebuffer_handle(resized_handle);if (src_buf)     dma_buf_free(src_size,   &src_dma_fd,     src_buf);if (resized_buf) dma_buf_free(resized_size,&resized_dma_fd, resized_buf);// 若 ret!=0, 返回 nullptrif (ret != 0) {return nullptr;}// 否则返回 final_data，即 "pad后" 的图像return final_data;
}

总结

这样实现资源占用还是约等于纯RGA实现，在推理单张图的时候，速度还可能更快一些：

RGA-resize+OpenCV-pad:
>>> After => new_width_=640, new_height_=360, scale=0.333
>>> padding_x_=0, padding_y_=24
[Convert] Step1: check => 9 usOriginal Input => width=1920, height=1080, channel=3Resize => new_width_=640, new_height_=360Pad => target_width_=640, target_height_=384
[Convert] Step2: alloc => 187 us
[Convert] StepFILL => copy input => 6220800 bytes
[Convert] StepFILL => 1709 us
rga_api version 1.10.1_[0]
[Convert] Step3: wrap => src=(1920x1080), resized=(640x360)
[Convert] Step3 => 892 us
[Convert] Step4: imresize => done
[Convert] Step4 => 2294 us
[Convert] Step5: copy resized => 691200 bytes
[Convert] Step5 => 543 us
[Convert] Step6: OpenCV pad => final_size=737280
[Convert] Step6 => 1314 us
[Convert] Step7: cleanup => 648 us
INFO: image resize time 7.64 ms
INFO: total infer time 28.11 ms: model time is 27.93 ms and postprocess time is 0.18ms
Iteration 1 - time: 35.846000 ms
Total execution time: 35.846000 ms
Average execution time: 35.846000 msOpenCV:
INFO: image resize time 9.49 ms
INFO: total infer time 23.12 ms: model time is 23.09 ms and postprocess time is 0.03ms
Iteration 1 - time: 32.688000 ms

查看全文

http://www.dtcms.com/wzjs/249220.html