超详细yolo8/11-detect目标检测全流程概述:配置环境、数据标注、训练、验证/预测、onnx部署(c++/python)详解
文章目录
- 一、配置环境
- 二、数据标注
- 三、模型训练
- 四、验证预测
- 五、onnx部署
- c++ 版
- python版本
一、配置环境
我的都是在Linux系统下,训练部署的;模型训练之前,需要配置好环境,Anaconda、显卡驱动、cuda、cudnn、pytorch等;建议在Linux下进行模型训练,能充分利用GPU,提升训练的速度。
【环境配置】Ubuntu/Debian小白从零开始配置深度学习环境和各种软件库(显卡驱动、CUDA、CUDNN、Pytorch、OpenCv、PCL、Cmake
…)【持续维护】
【yolo全家桶github官网】https://github.com/ultralytics/ultralytics
【yolo说明文档】https://docs.ultralytics.com/zh/
git clone https://github.com/ultralytics/ultralytics.git
#python 版本不固定,我一般次新版本,比如最新3.12 就用3.11
conda create -n yolov11 python==3.10 -y
conda activate yolov11
# 换成适配电脑自己的版本
#pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
#pip install ultralytics
pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple
二、数据标注
这里【强烈推荐使用】X-anylabeling, windows可以直接下载exe文件使用,linux下直接运行可执行文件,非常方便,而且后续可以加入分割大模型sam、sam2、自动标注等,可以实现快速标注;
官网链接:https://github.com/CVHub520/X-AnyLabeling
使用说明:https://blog.csdn.net/CVHub/article/details/144595664?spm=1001.2014.3001.5502
直接点击下载就行;
若是源码安装的,则要注意onnxruntime版本,onnxruntime>=1.16.0,且在运行时候启动下面代码
#一般我们往往直接是 python app.py,此时源码运行,必须在app.py的上一级目录下
python anylabeling/app.py
数据转换
标注完成后,标注文件是json格式,需要转换成yolo格式的txt文件,执行如下命令,会自动生成yolo格式的数据集,名字为YOLODataset,数据已经划分好,且配置好了数据的dataset.yaml文件;
#先安装
pip install labelme2yolov8
#data中是图片和json放到一起
labelme2yolo --json_dir ./data/ --val_size 0.1 --test_size 0.0
dataset.yaml文件内容为:若路径里面没有中文字符,就是配置路径,否则修改为自己刚刚生成数据集路径,names是类别,分别是类别的个数和名称,类别个数一定要和自己标注的类别数一样,类别名称随便起一个就行,不是特别重要。
path: \xxx\YOLODataset
train: images/train
val: images/val
test:
names:0: yuantong1: gunzi2: taotong
AI自动标注和大模型sam标注示例
学会使用这个两个,标注非常方便,需要把模型转换一下,成为onnx模型。
1. AI自动标注
这个前提是我们已经有目标预训练好的模型,格式onnx,加载这个模型,可以实现一次性标注;比较适用于,可以先标注少部分,训练后一个模型,然后使用这个模型全标注,再微调即可;如果想添加数据集,二次训练,使用这个功能最好了;
1.先写配置文件yolov8n_seg.yaml
type: yolov8_seg
name: yolov8n-seg-r20230620
display_name: YOLOv8n-Seg-My-Model Ultralytics #界面显示名字
#model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov8n-seg.onnx
model_path: /home/xxx/yolov11/yolo_base/model/yolo11n-seg.onnx #关键是这个路径
nms_threshold: 0.45
confidence_threshold: 0.25
classes:- person- bicycle- car- motorcycle- airplane- bus- train- truck- boat
2.加载模型,就是加载yaml文件,然后一次性标注所有文件就可以;
2. sam大模型标注
打开X–anylabeling的软件界面,在Auto里面选择带(VIT-Base Quant)模型,此时需要科学上网,会自动下载相关onnx模型文件,一个是解码的,一个是编码的。
若手动下载,需要写好配置文件,然后选择 … Load Custom Model ,这个要选择的是.yaml格式的文件,这个.yaml文件需要我们自己写了 主要修改模型路径。下面是yaml文件的大致内容。
type: segment_anything
name: segment_anything_vit_b_quant-r20230520
display_name: Segment Anything (ViT-Base Quant)
# encoder_model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.2.0/sam_vit_b_01ec64.encoder.quant.onnx
# decoder_model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.2.0/sam_vit_b_01ec64.decoder.quant.onnx
encoder_model_path: D:\CvHub_YoLo_obb\sam_vit_h_4b8939.encoder.quant.onnx
decoder_model_path: D:\CvHub_YoLo_obb\sam_vit_h_4b8939.decoder.quant.onnx
input_size: 1024
max_width: 1024
max_height: 682
三、模型训练
下载预训练模型 点击 【ultralytics 11说明文档】 训练时候不指定版本的话,默认是YOLO11n大小;选择n/s/m/l/x哪个看自己的需求;
关键的训练参数说明,列举关键的几个参数,全部的参数参考./ultralytics/cfg/default.yaml 这个文件
参数 | 描述 |
---|---|
data | 数据集路径,配置文件coco128.yaml |
model | 模型文件路径,如yolov8n_cls.pt |
epochs | 训练的周期数 |
batch | 每批次的图像数量,一般选1 2 4 8 16,由大到小逐一尝试,太大内存不够,太小未充分利用内存 |
imgsz | 输入图像的尺寸,默认预训练模型输入大小,若图片尺寸太大,可以适当减少尺寸,如原始图片大小1920,可以设置输入模型训练是1024,可以加快推理速度,精度会略略低 |
lr0 | 学习率,模型效果不好,可以调低一些,默认0.01 |
device | 若设备只有一个,默认,若多个,可以指定,device=0,1,2,3 |
optimizer | 使用的优化器,选项包括 [SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto] |
patience | 早停训练等待的最大无改善周期数 |
workers | 数据加载的工作线程数,默认为8 |
mosaic | 马赛克增强,若目标在图片中占比大,可以设置小一点,开启这个,不一定会增强模型 |
cd到下载的yolo/ultralytics-main文件夹里面,在终端执行训练命令,我习惯终端执行;执行训练命令,检测、分割等都比较相似,把关键词修改下就行,如,detect ->segment。
yolo detect train data=dataset.yaml model=yolo11n.pt epochs=300 imgsz=1920 amp=False batch=2 lr0=0.001 mosaic=0.05 patience=200
训练结果在runs/detect里面;观察损坏函数曲线,粗略判断训练结果的好坏;若效果不理想可以在训练的第一次模型上进行第二次训练,或者迭代次数多些,500代/1000代。
yolo detect train data=dataset.yaml model=./runs/detect/train/weights/best.pt epochs=300 imgsz=1920 amp=False batch=2 lr0=0.001 mosaic=0.05 patience=200
四、验证预测
指定预测文件夹,结果保存runs/detect/predict里面,conf是置信度,iou阈值设置为0.5
yolo detect predict model=runs/detect/train4/weights/best.pt source=/xxximages/test save=true conf=0.4 iou=0.5
五、onnx部署
训练完毕后,一般要部署到电脑、嵌入式设备中,这是导出onnx模型进行部署;onnx版本是17,看你自己安装的版本;在weights文件夹里面,生成对应的best.onnx文件。opset=17是指定的版本,可以向下兼容;高版本可以降低初始化时间,自己测试的,也不一定。
yolo export model=/xxx/yolov11/runs/detect/train4/weights/best.pt format=onnx opset=17 simplify=True
可以使用在线网站 https://netron.app/,查看导出的onnx模型的输入和输出。
c++ 版
需要安装onnxruntime推理库,onnx比较友好,其实不需要源码编译,官网有直接生成的库文件可以是使用;这也是onnx广泛应用的原因,虽然onnx一般cpu速度不如openvino,gpu速度不如tensort,但通用强,部署相对简单点。
下载对应的onnxruntime包,官网 https://github.com/microsoft/onnxruntime/releases/,目前最新的出了1.22版本,我用的是1.17,看自己的配置需求,必须与上述导出的模型一致;可以安装高版本,向下兼容。
主要参考大佬github开源文件 https://github.com/UNeedCryDear/yolov8-opencv-onnxruntime-cpp,进行了简单修改,把模型初始化放到了主程序里面,推理预测放到检测里面,不用每次都执行一遍初始化。主要涉及五个文件,main.cpp yolov8_utils.h yolov8_onnx.h yolov8_utils.cpp yolov8_onnx.cpp。
yolov8_utils.h 主要是额外的结果结构体,显示的函数等;
#pragma once
#include<iostream>
#include <numeric>
#include <unistd.h>
#include<opencv2/opencv.hpp>#define ORT_OLD_VISON 13 //ort1.12.0 ֮ǰ�İ汾Ϊ�ɰ汾APIstruct PoseKeyPoint {float x = 0;float y = 0;float confidence = 0;
};
struct OutputParams {int id; //������idfloat confidence; //������Ŷ�cv::Rect box; //���ο�cv::RotatedRect rotatedBox; //obb������ο�cv::Mat boxMask; //���ο���mask����ʡ�ڴ�ռ�ͼӿ��ٶ�std::vector<PoseKeyPoint> keyPoints; //pose key points};
struct MaskParams {//int segChannels = 32;//int segWidth = 160;//int segHeight = 160;int netWidth = 1920; //640int netHeight = 1920; //640float maskThreshold = 0.5;cv::Size srcImgShape;cv::Vec4d params;
};struct PoseParams {float kptThreshold = 0.5;int kptRadius = 5;bool isDrawKptLine = true; //If True, the function will draw lines connecting keypoint for human pose.Default is True.cv::Scalar personColor = cv::Scalar(0, 0, 255);std::vector<std::vector<int>>skeleton = {{16, 14} ,{14, 12},{17, 15},{15, 13},{12, 13},{6, 12},{7, 13},{6, 7},{6, 8},{7, 9},{8, 10},{9, 11},{2, 3},{1, 2},{1, 3},{2, 4},{3, 5},{4, 6},{5, 7}};std::vector<cv::Scalar> posePalette ={cv::Scalar(255, 128, 0) ,cv::Scalar(255, 153, 51),cv::Scalar(255, 178, 102),cv::Scalar(230, 230, 0),cv::Scalar(255, 153, 255),cv::Scalar(153, 204, 255),cv::Scalar(255, 102, 255),cv::Scalar(255, 51, 255),cv::Scalar(102, 178, 255),cv::Scalar(51, 153, 255),cv::Scalar(255, 153, 153),cv::Scalar(255, 102, 102),cv::Scalar(255, 51, 51),cv::Scalar(153, 255, 153),cv::Scalar(102, 255, 102),cv::Scalar(51, 255, 51),cv::Scalar(0, 255, 0),cv::Scalar(0, 0, 255),cv::Scalar(255, 0, 0),cv::Scalar(255, 255, 255),};std::vector<int> limbColor = { 9, 9, 9, 9, 7, 7, 7, 0, 0, 0, 0, 0, 16, 16, 16, 16, 16, 16, 16 };std::vector<int> kptColor = { 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 9, 9, 9, 9, 9, 9 };std::map<unsigned int, std::string> kptBodyNames{{0,"Nose"},{1, "left_eye"}, {2, "right_eye"},{3, "left_ear"}, {4, "right_ear"},{5, "left_shoulder"}, {6, "right_shoulder"},{7, "left_elbow"}, {8, "right_elbow"},{9, "left_wrist"}, {10,"right_wrist"},{11,"left_hip"}, {12,"right_hip"},{13,"left_knee"}, {14,"right_knee"},{15,"left_ankle"}, {16,"right_ankle"}};
};
bool CheckModelPath(std::string modelPath);
bool CheckParams(int netHeight, int netWidth, const int* netStride, int strideSize);
void DrawPred(cv::Mat& img,std::vector<OutputParams> result,std::vector<std::string> classNames,std::vector<cv::Scalar> color,bool isVideo = false
);
void DrawPredPose(cv::Mat& img, std::vector<OutputParams> result, PoseParams& poseParams, bool isVideo = false);void DrawRotatedBox(cv::Mat& srcImg, cv::RotatedRect box, cv::Scalar color, int thinkness);
void LetterBox(const cv::Mat& image, cv::Mat& outImage,cv::Vec4d& params, //[ratio_x,ratio_y,dw,dh]const cv::Size& newShape = cv::Size(640, 640),//640,640bool autoShape = false,bool scaleFill = false,bool scaleUp = true,int stride = 32,const cv::Scalar& color = cv::Scalar(114, 114, 114));
void GetMask(const cv::Mat& maskProposals, const cv::Mat& maskProtos, std::vector<OutputParams>& output, const MaskParams& maskParams);
void GetMask2(const cv::Mat& maskProposals, const cv::Mat& maskProtos, OutputParams& output, const MaskParams& maskParams);
int BBox2Obb(float centerX, float centerY, float boxW, float boxH, float angle, cv::RotatedRect& rotatedRect);
yolov8_utils.cpp 画图显示,可以修改为自己的风格
#pragma once
#include "yolov8_utils.h"
//using namespace cv;
//using namespace std;
bool CheckParams(int netHeight, int netWidth, const int* netStride, int strideSize) {if (netHeight % netStride[strideSize - 1] != 0 || netWidth % netStride[strideSize - 1] != 0){std::cout << "Error:_netHeight and _netWidth must be multiple of max stride " << netStride[strideSize - 1] << "!" << std::endl;return false;}return true;
}
bool CheckModelPath(std::string modelPath) {if (0 != access(modelPath.c_str(), 0)) {std::cout << "Model path does not exist, please check " << modelPath << std::endl;return false;}elsereturn true;
}
void LetterBox(const cv::Mat& image, cv::Mat& outImage, cv::Vec4d& params, const cv::Size& newShape,bool autoShape, bool scaleFill, bool scaleUp, int stride, const cv::Scalar& color)
{if (false) {int maxLen = MAX(image.rows, image.cols);outImage = cv::Mat::zeros(cv::Size(maxLen, maxLen), CV_8UC3);image.copyTo(outImage(cv::Rect(0, 0, image.cols, image.rows)));params[0] = 1;params[1] = 1;params[3] = 0;params[2] = 0;}cv::Size shape = image.size();float r = std::min((float)newShape.height / (float)shape.height,(float)newShape.width / (float)shape.width);if (!scaleUp)r = std::min(r, 1.0f);float ratio[2]{ r, r };int new_un_pad[2] = { (int)std::round((float)shape.width * r),(int)std::round((float)shape.height * r) };auto dw = (float)(newShape.width - new_un_pad[0]);auto dh = (float)(newShape.height - new_un_pad[1]);if (autoShape){dw = (float)((int)dw % stride);dh = (float)((int)dh % stride);}else if (scaleFill){dw = 0.0f;dh = 0.0f;new_un_pad[0] = newShape.width;new_un_pad[1] = newShape.height;ratio[0] = (float)newShape.width / (float)shape.width;ratio[1] = (float)newShape.height / (float)shape.height;}dw /= 2.0f;dh /= 2.0f;if (shape.width != new_un_pad[0] && shape.height != new_un_pad[1]){cv::resize(image, outImage, cv::Size(new_un_pad[0], new_un_pad[1]));}else {outImage = image.clone();}int top = int(std::round(dh - 0.1f));int bottom = int(std::round(dh + 0.1f));int left = int(std::round(dw - 0.1f));int right = int(std::round(dw + 0.1f));params[0] = ratio[0];params[1] = ratio[1];params[2] = left;params[3] = top;cv::copyMakeBorder(outImage, outImage, top, bottom, left, right, cv::BORDER_CONSTANT, color);
}void GetMask(const cv::Mat& maskProposals, const cv::Mat& maskProtos, std::vector<OutputParams>& output, const MaskParams& maskParams) {//std::cout << maskProtos.size << std::endl;int net_width = maskParams.netWidth;int net_height = maskParams.netHeight;int seg_channels = maskProtos.size[1];int seg_height = maskProtos.size[2];int seg_width = maskProtos.size[3];float mask_threshold = maskParams.maskThreshold;cv::Vec4f params = maskParams.params;cv::Size src_img_shape = maskParams.srcImgShape;cv::Mat protos = maskProtos.reshape(0, { seg_channels,seg_width * seg_height });cv::Mat matmul_res = (maskProposals * protos).t();cv::Mat masks = matmul_res.reshape(output.size(), { seg_width,seg_height });std::vector<cv::Mat> maskChannels;split(masks, maskChannels);for (int i = 0; i < output.size(); ++i) {cv::Mat dest, mask;//sigmoidcv::exp(-maskChannels[i], dest);dest = 1.0 / (1.0 + dest);cv::Rect roi(int(params[2] / net_width * seg_width), int(params[3] / net_height * seg_height), int(seg_width - params[2] / 2), int(seg_height - params[3] / 2));dest = dest(roi);resize(dest, mask, src_img_shape, cv::INTER_NEAREST);//cropcv::Rect temp_rect = output[i].box;mask = mask(temp_rect) > mask_threshold;output[i].boxMask = mask;}
}void GetMask2(const cv::Mat& maskProposals, const cv::Mat& maskProtos, OutputParams& output, const MaskParams& maskParams) {int net_width = maskParams.netWidth;int net_height = maskParams.netHeight;int seg_channels = maskProtos.size[1];int seg_height = maskProtos.size[2];int seg_width = maskProtos.size[3];float mask_threshold = maskParams.maskThreshold;cv::Vec4f params = maskParams.params;cv::Size src_img_shape = maskParams.srcImgShape;cv::Rect temp_rect = output.box;//crop from mask_protosint rang_x = floor((temp_rect.x * params[0] + params[2]) / net_width * seg_width);int rang_y = floor((temp_rect.y * params[1] + params[3]) / net_height * seg_height);int rang_w = ceil(((temp_rect.x + temp_rect.width) * params[0] + params[2]) / net_width * seg_width) - rang_x;int rang_h = ceil(((temp_rect.y + temp_rect.height) * params[1] + params[3]) / net_height * seg_height) - rang_y;//�������� mask_protos(roi_rangs).clone()λ�ñ�����˵�����output.box���ݲ��ԣ����߾��ο��1�����صģ����������ע�Ͳ��ַ�ֹ������rang_w = MAX(rang_w, 1);rang_h = MAX(rang_h, 1);if (rang_x + rang_w > seg_width) {if (seg_width - rang_x > 0)rang_w = seg_width - rang_x;elserang_x -= 1;}if (rang_y + rang_h > seg_height) {if (seg_height - rang_y > 0)rang_h = seg_height - rang_y;elserang_y -= 1;}std::vector<cv::Range> roi_rangs;roi_rangs.push_back(cv::Range(0, 1));roi_rangs.push_back(cv::Range::all());roi_rangs.push_back(cv::Range(rang_y, rang_h + rang_y));roi_rangs.push_back(cv::Range(rang_x, rang_w + rang_x));//cropcv::Mat temp_mask_protos = maskProtos(roi_rangs).clone();cv::Mat protos = temp_mask_protos.reshape(0, { seg_channels,rang_w * rang_h });cv::Mat matmul_res = (maskProposals * protos).t();cv::Mat masks_feature = matmul_res.reshape(1, { rang_h,rang_w });cv::Mat dest, mask;//sigmoidcv::exp(-masks_feature, dest);dest = 1.0 / (1.0 + dest);int left = floor((net_width / seg_width * rang_x - params[2]) / params[0]);int top = floor((net_height / seg_height * rang_y - params[3]) / params[1]);int width = ceil(net_width / seg_width * rang_w / params[0]);int height = ceil(net_height / seg_height * rang_h / params[1]);resize(dest, mask, cv::Size(width, height), cv::INTER_NEAREST);cv::Rect mask_rect = temp_rect - cv::Point(left, top);mask_rect &= cv::Rect(0, 0, width, height);mask = mask(mask_rect) > mask_threshold;if (mask.rows != temp_rect.height || mask.cols != temp_rect.width) { //https://github.com/UNeedCryDear/yolov8-opencv-onnxruntime-cpp/pull/30resize(mask, mask, temp_rect.size(), cv::INTER_NEAREST);}output.boxMask = mask;}
int BBox2Obb(float centerX, float centerY, float boxW, float boxH, float angle, cv::RotatedRect& rotatedRect) {rotatedRect = cv::RotatedRect(cv::Point2f(centerX, centerY), cv::Size2f(boxW, boxH),angle);return 0;
}
void DrawRotatedBox(cv::Mat &srcImg, cv::RotatedRect box, cv::Scalar color, int thinkness) {cv::Point2f p[4];box.points(p);for (int l = 0; l < 4; ++l) {line(srcImg, p[l], p[(l + 1) % 4], color, thinkness, 8);}
}void DrawPred(cv::Mat& img, std::vector<OutputParams> result, std::vector<std::string> classNames, std::vector<cv::Scalar> color, bool isVideo) {cv::Mat mask = img.clone();for (int i = 0; i < result.size(); i++) {int left=0, top=0;int color_num = i;if (result[i].box.area() > 0) {rectangle(img, result[i].box, color[result[i].id], 2, 8);left = result[i].box.x;top = result[i].box.y;}if (result[i].rotatedBox.size.width * result[i].rotatedBox.size.height > 0) {DrawRotatedBox(img, result[i].rotatedBox, color[result[i].id], 2);left = result[i].rotatedBox.center.x;top = result[i].rotatedBox.center.y;}if (result[i].boxMask.rows && result[i].boxMask.cols > 0)mask(result[i].box).setTo(color[result[i].id], result[i].boxMask);std::string label = classNames[result[i].id] + ":" + std::to_string(result[i].confidence);int baseLine;cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);top = MAX(top, labelSize.height);//rectangle(frame, cv::Point(left, top - int(1.5 * labelSize.height)), cv::Point(left + int(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), FILLED);putText(img, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 1, color[result[i].id], 2);}//std::cout<<"end draw_img"<<std::endl;cv::addWeighted(img, 0.5, mask, 0.5, 0, img); //add mask to srccv::imwrite("pre_img.png",img);cv::imshow("pre_img", img);// if (!isVideo)// cv::waitKey();//destroyAllWindows();
}void DrawPredPose(cv::Mat& img, std::vector<OutputParams> result, PoseParams& poseParams, bool isVideo) {for (int i = 0; i < result.size(); i++) {int left, top;int color_num = i;if (result[i].box.area() > 0) {rectangle(img, result[i].box, poseParams.personColor, 2, 8);left = result[i].box.x;top = result[i].box.y;}else continue;std::string label = "person:" + std::to_string(result[i].confidence);int baseLine;cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);top = MAX(top, labelSize.height);//rectangle(frame, cv::Point(left, top - int(1.5 * labelSize.height)), cv::Point(left + int(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), FILLED);putText(img, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 1, poseParams.personColor, 2);if (result[i].keyPoints.size()!= poseParams.kptBodyNames.size())continue;for (int j = 0; j < result[i].keyPoints.size(); ++j) {PoseKeyPoint kpt = result[i].keyPoints[j];if (kpt.confidence < poseParams.kptThreshold)continue;cv::Scalar kptColor = poseParams.posePalette[poseParams.kptColor[j]];cv::circle(img, cv::Point(kpt.x, kpt.y), poseParams.kptRadius, kptColor, -1, 8);}if (poseParams.isDrawKptLine) {for (int j = 0; j < poseParams.skeleton.size(); ++j) {PoseKeyPoint kpt0 = result[i].keyPoints[poseParams.skeleton[j][0]-1];PoseKeyPoint kpt1 = result[i].keyPoints[poseParams.skeleton[j][1]-1];if (kpt0.confidence < poseParams.kptThreshold || kpt1.confidence < poseParams.kptThreshold)continue;cv::Scalar kptColor= poseParams.posePalette[poseParams.limbColor[j]];cv::line(img, cv::Point(kpt0.x, kpt0.y),cv::Point(kpt1.x, kpt1.y), kptColor, 2, 8);}}}cv::imshow("1", img);if (!isVideo)cv::waitKey();//destroyAllWindows();
}
yolov8_onnx.h 一定要记得修改模型输入的大小 const int _netWidth和_netHeight ,要与模型导出的输入大小一样。
#pragma once
#include <iostream>
#include<memory>
#include <opencv2/opencv.hpp>
#include "yolov8_utils.h"//#include<onnxruntime_cxx_api.h>
#include <onnxruntime_cxx_api.h>
//#include <tensorrt_provider_factory.h> //if use OrtTensorRTProviderOptionsV2
//#include <onnxruntime_c_api.h>class Yolov8Onnx {
public:Yolov8Onnx() :_OrtMemoryInfo(Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtDeviceAllocator, OrtMemType::OrtMemTypeCPUOutput)) {};~Yolov8Onnx() {if (_OrtSession != nullptr)delete _OrtSession;};// delete _OrtMemoryInfo;public:/** \brief Read onnx-model* \param[in] modelPath:onnx-model path* \param[in] isCuda:if true,use Ort-GPU,else run it on cpu.* \param[in] cudaID:if isCuda==true,run Ort-GPU on cudaID.* \param[in] warmUp:if isCuda==true,warm up GPU-model.*/bool ReadModel(const std::string& modelPath, bool isCuda = false, int cudaID = 0, bool warmUp = true);/** \brief detect.* \param[in] srcImg:a 3-channels image.* \param[out] output:detection results of input image.*/bool OnnxDetect(cv::Mat& srcImg, std::vector<OutputParams>& output);/** \brief detect,batch size= _batchSize* \param[in] srcImg:A batch of images.* \param[out] output:detection results of input images.*/bool OnnxBatchDetect(std::vector<cv::Mat>& srcImg, std::vector<std::vector<OutputParams>>& output);private:template <typename T>T VectorProduct(const std::vector<T>& v){return std::accumulate(v.begin(), v.end(), 1, std::multiplies<T>());};int Preprocessing(const std::vector<cv::Mat>& srcImgs, std::vector<cv::Mat>& outSrcImgs, std::vector<cv::Vec4d>& params);const int _netWidth = 1920; //ONNX-net-input-width 640const int _netHeight = 1920; //ONNX-net-input-heightint _batchSize = 1; //if multi-batch,set thisbool _isDynamicShape = false;//onnx support dynamic shapefloat _classThreshold = 0.3;float _nmsThreshold = 0.45;float _maskThreshold = 0.5;//ONNXRUNTIME Ort::Env _OrtEnv = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_ERROR, "Yolov8");Ort::SessionOptions _OrtSessionOptions = Ort::SessionOptions();Ort::Session* _OrtSession = nullptr;Ort::MemoryInfo _OrtMemoryInfo;
#if ORT_API_VERSION < ORT_OLD_VISONchar* _inputName, * _output_name0;
#elsestd::shared_ptr<char> _inputName, _output_name0;
#endifstd::vector<char*> _inputNodeNames; //����ڵ���std::vector<char*> _outputNodeNames;//����ڵ���size_t _inputNodesNum = 0; //����ڵ���size_t _outputNodesNum = 0; //����ڵ���ONNXTensorElementDataType _inputNodeDataType; //��������ONNXTensorElementDataType _outputNodeDataType;std::vector<int64_t> _inputTensorShape; //��������shapestd::vector<int64_t> _outputTensorShape;public:std::vector<std::string> _className = {"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light","fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow","elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee","skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple","sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch","potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone","microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear","hair drier", "toothbrush"};// std::vector<std::string> _Name = {// "9.4","9.2","9.3","9.1","5.1","5.2","8.3","4.1","4.2","7.1",// "3.1","8.1","8.2","1.2","5.3","5.4","1.1","4.5","4.6","3.2","4.3","4.4","2.1","6.3","6.1",// "6.2","5.5","5.6","2.2"// };std::vector<std::string> _Name = {"0","1","2"};
};
yolov8_onnx.cpp 整个推理的核心过程
#include "yolov8_onnx.h"
//using namespace std;
//using namespace cv;
//using namespace cv::dnn;
using namespace Ort;
bool Yolov8Onnx::ReadModel(const std::string& modelPath, bool isCuda, int cudaID, bool warmUp) {if (_batchSize < 1) _batchSize = 1;try{if (!CheckModelPath(modelPath))return false;std::vector<std::string> available_providers = GetAvailableProviders();auto cuda_available = std::find(available_providers.begin(), available_providers.end(), "CUDAExecutionProvider");if (isCuda && (cuda_available == available_providers.end())){std::cout << "Your ORT build without GPU. Change to CPU." << std::endl;std::cout << "************* Infer model on CPU! *************" << std::endl;}else if (isCuda && (cuda_available != available_providers.end())){std::cout << "************* Infer model on GPU! *************" << std::endl;
#if ORT_API_VERSION < ORT_OLD_VISONOrtCUDAProviderOptions cudaOption;cudaOption.device_id = cudaID;_OrtSessionOptions.AppendExecutionProvider_CUDA(cudaOption);
#elseOrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(_OrtSessionOptions, cudaID);
#endif}else{std::cout << "************* Infer model on CPU! *************" << std::endl;}//_OrtSessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);#ifdef _WIN32std::wstring model_path(modelPath.begin(), modelPath.end());_OrtSession = new Ort::Session(_OrtEnv, model_path.c_str(), _OrtSessionOptions);
#else_OrtSession = new Ort::Session(_OrtEnv, modelPath.c_str(), _OrtSessionOptions);
#endifOrt::AllocatorWithDefaultOptions allocator;//init input_inputNodesNum = _OrtSession->GetInputCount();
#if ORT_API_VERSION < ORT_OLD_VISON_inputName = _OrtSession->GetInputName(0, allocator);_inputNodeNames.push_back(_inputName);
#else_inputName = std::move(_OrtSession->GetInputNameAllocated(0, allocator));_inputNodeNames.push_back(_inputName.get());
#endif//std::cout << _inputNodeNames[0] << std::endl;Ort::TypeInfo inputTypeInfo = _OrtSession->GetInputTypeInfo(0);auto input_tensor_info = inputTypeInfo.GetTensorTypeAndShapeInfo();_inputNodeDataType = input_tensor_info.GetElementType();_inputTensorShape = input_tensor_info.GetShape();if (_inputTensorShape[0] == -1){_isDynamicShape = true;_inputTensorShape[0] = _batchSize;}if (_inputTensorShape[2] == -1 || _inputTensorShape[3] == -1) {_isDynamicShape = true;_inputTensorShape[2] = _netHeight;_inputTensorShape[3] = _netWidth;}//init output_outputNodesNum = _OrtSession->GetOutputCount();
#if ORT_API_VERSION < ORT_OLD_VISON_output_name0 = _OrtSession->GetOutputName(0, allocator);_outputNodeNames.push_back(_output_name0);
#else_output_name0 = std::move(_OrtSession->GetOutputNameAllocated(0, allocator));_outputNodeNames.push_back(_output_name0.get());
#endifOrt::TypeInfo type_info_output0(nullptr);type_info_output0 = _OrtSession->GetOutputTypeInfo(0); //output0auto tensor_info_output0 = type_info_output0.GetTensorTypeAndShapeInfo();_outputNodeDataType = tensor_info_output0.GetElementType();_outputTensorShape = tensor_info_output0.GetShape();//_outputMaskNodeDataType = tensor_info_output1.GetElementType(); //the same as output0//_outputMaskTensorShape = tensor_info_output1.GetShape();//if (_outputTensorShape[0] == -1)//{// _outputTensorShape[0] = _batchSize;// _outputMaskTensorShape[0] = _batchSize;//}//if (_outputMaskTensorShape[2] == -1) {// //size_t ouput_rows = 0;// //for (int i = 0; i < _strideSize; ++i) {// // ouput_rows += 3 * (_netWidth / _netStride[i]) * _netHeight / _netStride[i];// //}// //_outputTensorShape[1] = ouput_rows;// _outputMaskTensorShape[2] = _segHeight;// _outputMaskTensorShape[3] = _segWidth;//}//warm upif (isCuda && warmUp) {//draw runstd::cout << "Start warming up" << std::endl;size_t input_tensor_length = VectorProduct(_inputTensorShape);float* temp = new float[input_tensor_length];std::vector<Ort::Value> input_tensors;std::vector<Ort::Value> output_tensors;input_tensors.push_back(Ort::Value::CreateTensor<float>(_OrtMemoryInfo, temp, input_tensor_length, _inputTensorShape.data(),_inputTensorShape.size()));for (int i = 0; i < 3; ++i) {output_tensors = _OrtSession->Run(Ort::RunOptions{ nullptr },_inputNodeNames.data(),input_tensors.data(),_inputNodeNames.size(),_outputNodeNames.data(),_outputNodeNames.size());}delete[]temp;}}catch (const std::exception&) {return false;}return true;
}int Yolov8Onnx::Preprocessing(const std::vector<cv::Mat>& srcImgs, std::vector<cv::Mat>& outSrcImgs, std::vector<cv::Vec4d>& params) {outSrcImgs.clear();cv::Size input_size = cv::Size(_netWidth, _netHeight);for (int i = 0; i < srcImgs.size(); ++i) {cv::Mat temp_img = srcImgs[i];cv::Vec4d temp_param = {1,1,0,0};if (temp_img.size() != input_size) {cv::Mat borderImg;LetterBox(temp_img, borderImg, temp_param, input_size, false, false, true, 32);//std::cout << borderImg.size() << std::endl;outSrcImgs.push_back(borderImg);params.push_back(temp_param);}else {outSrcImgs.push_back(temp_img);params.push_back(temp_param);}}int lack_num = _batchSize- srcImgs.size();if (lack_num > 0) {for (int i = 0; i < lack_num; ++i) {cv::Mat temp_img = cv::Mat::zeros(input_size, CV_8UC3);cv::Vec4d temp_param = { 1,1,0,0 };outSrcImgs.push_back(temp_img);params.push_back(temp_param);}}return 0;}
bool Yolov8Onnx::OnnxDetect(cv::Mat& srcImg, std::vector<OutputParams>& output) {std::vector<cv::Mat> input_data = { srcImg };std::vector<std::vector<OutputParams>> tenp_output;if (OnnxBatchDetect(input_data, tenp_output)) {output = tenp_output[0];return true;}else return false;
}
bool Yolov8Onnx::OnnxBatchDetect(std::vector<cv::Mat>& srcImgs, std::vector<std::vector<OutputParams>>& output) {std::vector<cv::Vec4d> params;std::vector<cv::Mat> input_images;cv::Size input_size(_netWidth, _netHeight);//preprocessingPreprocessing(srcImgs, input_images, params);cv::Mat blob = cv::dnn::blobFromImages(input_images, 1 / 255.0, input_size, cv::Scalar(0, 0, 0), true, false);int64_t input_tensor_length = VectorProduct(_inputTensorShape);std::vector<Ort::Value> input_tensors;std::vector<Ort::Value> output_tensors;input_tensors.push_back(Ort::Value::CreateTensor<float>(_OrtMemoryInfo, (float*)blob.data, input_tensor_length, _inputTensorShape.data(), _inputTensorShape.size()));output_tensors = _OrtSession->Run(Ort::RunOptions{ nullptr },_inputNodeNames.data(),input_tensors.data(),_inputNodeNames.size(),_outputNodeNames.data(),_outputNodeNames.size());//post-processfloat* all_data = output_tensors[0].GetTensorMutableData<float>();_outputTensorShape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape();int net_width = _outputTensorShape[1];int socre_array_length = net_width - 4;int64_t one_output_length = VectorProduct(_outputTensorShape) / _outputTensorShape[0];for (int img_index = 0; img_index < srcImgs.size(); ++img_index) {cv::Mat output0 = cv::Mat(cv::Size((int)_outputTensorShape[2], (int)_outputTensorShape[1]), CV_32F, all_data).t(); //[bs,116,8400]=>[bs,8400,116]all_data += one_output_length;float* pdata = (float*)output0.data;int rows = output0.rows;std::vector<int> class_ids;//���id����std::vector<float> confidences;//���ÿ��id��Ӧ���Ŷ�����std::vector<cv::Rect> boxes;//ÿ��id���ο�for (int r = 0; r < rows; ++r) { //stridecv::Mat scores(1, socre_array_length, CV_32F, pdata + 4);cv::Point classIdPoint;double max_class_socre;minMaxLoc(scores, 0, &max_class_socre, 0, &classIdPoint);max_class_socre = (float)max_class_socre;if (max_class_socre >= _classThreshold) {//rect [x,y,w,h]float x = (pdata[0] - params[img_index][2]) / params[img_index][0]; //xfloat y = (pdata[1] - params[img_index][3]) / params[img_index][1]; //yfloat w = pdata[2] / params[img_index][0]; //wfloat h = pdata[3] / params[img_index][1]; //hint left = MAX(int(x - 0.5 * w + 0.5), 0);int top = MAX(int(y - 0.5 * h + 0.5), 0);class_ids.push_back(classIdPoint.x);confidences.push_back(max_class_socre);boxes.push_back(cv::Rect(left, top, int(w + 0.5), int(h + 0.5)));}pdata += net_width;//��һ��}std::vector<int> nms_result;cv::dnn::NMSBoxes(boxes, confidences, _classThreshold, _nmsThreshold, nms_result);std::vector<std::vector<float>> temp_mask_proposals;cv::Rect holeImgRect(0, 0, srcImgs[img_index].cols, srcImgs[img_index].rows);std::vector<OutputParams> temp_output;for (int i = 0; i < nms_result.size(); ++i) {int idx = nms_result[i];OutputParams result;result.id = class_ids[idx];result.confidence = confidences[idx];result.box = boxes[idx] & holeImgRect;temp_output.push_back(result);}output.push_back(temp_output);}if (output.size())return true;elsereturn false;
}
main.cpp 使用过程
#include <iostream>
#include<opencv2/opencv.hpp>
#include<math.h>
#include "yolov8_onnx.h"
#include<time.h>
//#define VIDEO_OPENCV //if define, use opencv for video.using namespace std;
using namespace cv;
using namespace dnn;template<typename _Tp>
std::vector<OutputParams> yolov8_onnx(_Tp& task, cv::Mat& img, std::string& model_path)
{// if (task.ReadModel(model_path, false,0,true)) {// std::cout << "read net ok!" << std::endl;// }//生成随机颜色std::vector<cv::Scalar> color;srand(time(0));for (int i = 0; i < 80; i++) {int b = rand() % 256;int g = rand() % 256;int r = rand() % 256;color.push_back(cv::Scalar(b, g, r));}std::vector<OutputParams> result;if (task.OnnxDetect(img, result)) {std::cout<<"111"<<std::endl;//发现这个画图程序挺占时间的//DrawPred(img, result, task._Name, color,false);}else {std::cout << "Detect Failed!" << std::endl;}//system("pause");return result;
}
int main() {std::string img_path = "../images/_20250609_144509.bmp";std::string model_path_detect = "../models/20250612.onnx";cv::Mat src = imread(img_path);cv::Mat img = src.clone();Yolov8Onnx task_detect_ort;if (task_detect_ort.ReadModel(model_path_detect, false,0,true)) {std::cout << "read net ok!" << std::endl;}std::vector<OutputParams> results_detect;long long startTime = std::chrono::system_clock::now().time_since_epoch().count(); //nsresults_detect=yolov8_onnx(task_detect_ort, img, model_path_detect); long long timeNow = std::chrono::system_clock::now().time_since_epoch().count();double timeuse = (timeNow - startTime) * 0.000001;//std::cout<<"end detect"<<endl;std::cout << (timeNow - startTime) * 0.000001 << "ms\n";std::cout<<"num: "<<results_detect.size()<<endl;OutputParams out_result;for (int i = 0; i < results_detect.size(); i++) {cout<<results_detect[i].id<<" "<<task_detect_ort._Name[results_detect[i].id]<<" conf: "<<results_detect[i].confidence<<" rect: "<< results_detect[i].box<<endl;} cv::waitKey(0);return 0;
}
附赠一个CMakeLists.txt
CMAKE_MINIMUM_REQUIRED(VERSION 3.0.0)
project(YOLOv8)SET (ONNXRUNTIME_DIR /home/xxx/src/onnxruntime-linux-x64-gpu-1.17.0)find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
# 打印opencv信息
message(STATUS "OpenCV library status:")
message(STATUS " config: ${OpenCV_DIR}")
message(STATUS " version: ${OpenCV_VERSION}")
message(STATUS " libraries: ${OpenCV_LIBS}")
message(STATUS " include path: ${OpenCV_INCLUDE_DIRS}")# 添加PCL环境
find_package(PCL REQUIRED)
add_definitions(${PCL_DEFINITIONS})
include_directories(${PCL_INCLUDE_DIRS})
link_directories(${PCL_LIBRARY_DIRS})
find_package(VTK REQUIRED)ADD_EXECUTABLE(YOLOv8 yolov8_utils.h yolov8_onnx.h main.cpp yolov8_utils.cpp yolov8_onnx.cpp )
SET(CMAKE_CXX_STANDARD 14)
SET(CMAKE_CXX_STANDARD_REQUIRED ON)TARGET_INCLUDE_DIRECTORIES(YOLOv8 PRIVATE "${ONNXRUNTIME_DIR}/include")
TARGET_COMPILE_FEATURES(YOLOv8 PRIVATE cxx_std_14)
TARGET_LINK_LIBRARIES(YOLOv8 ${OpenCV_LIBS}
)
# TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/libonnxruntime.so")
# TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/libonnxruntime.so.1.13.1")
if (WIN32)TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/onnxruntime.lib")
endif(WIN32)if (UNIX)TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/libonnxruntime.so")
endif(UNIX)
修改图片和模型路径,使用cmake make下生成可执行文件,然后在可执行文件目录下Linux终端运行 ./YOLOv8
python版本
python部署,相对简单,修改下面模型和图片路径,直接运行就可以。
import cv2
import onnxruntime as ort
from PIL import Image
import numpy as np# 置信度
confidence_thres = 0.35
# iou阈值
iou_thres = 0.5
# 类别
classes = {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus','...'}
# 随机颜色
color_palette = np.random.uniform(100, 255, size=(len(classes), 3))# 判断是使用GPU或CPU
providers = [('CUDAExecutionProvider', {'device_id': 0, # 可以选择GPU设备ID,如果你有多个GPU}),'CPUExecutionProvider', # 也可以设置CPU作为备选
]def calculate_iou(box, other_boxes):"""计算给定边界框与一组其他边界框之间的交并比(IoU)。参数:- box: 单个边界框,格式为 [x1, y1, width, height]。- other_boxes: 其他边界框的数组,每个边界框的格式也为 [x1, y1, width, height]。返回值:- iou: 一个数组,包含给定边界框与每个其他边界框的IoU值。"""# 计算交集的左上角坐标x1 = np.maximum(box[0], np.array(other_boxes)[:, 0])y1 = np.maximum(box[1], np.array(other_boxes)[:, 1])# 计算交集的右下角坐标x2 = np.minimum(box[0] + box[2], np.array(other_boxes)[:, 0] + np.array(other_boxes)[:, 2])y2 = np.minimum(box[1] + box[3], np.array(other_boxes)[:, 1] + np.array(other_boxes)[:, 3])# 计算交集区域的面积intersection_area = np.maximum(0, x2 - x1) * np.maximum(0, y2 - y1)# 计算给定边界框的面积box_area = box[2] * box[3]# 计算其他边界框的面积other_boxes_area = np.array(other_boxes)[:, 2] * np.array(other_boxes)[:, 3]# 计算IoU值iou = intersection_area / (box_area + other_boxes_area - intersection_area)return ioudef custom_NMSBoxes(boxes, scores, confidence_threshold, iou_threshold):# 如果没有边界框,则直接返回空列表if len(boxes) == 0:return []# 将得分和边界框转换为NumPy数组scores = np.array(scores)boxes = np.array(boxes)# 根据置信度阈值过滤边界框mask = scores > confidence_thresholdfiltered_boxes = boxes[mask]filtered_scores = scores[mask]# 如果过滤后没有边界框,则返回空列表if len(filtered_boxes) == 0:return []# 根据置信度得分对边界框进行排序sorted_indices = np.argsort(filtered_scores)[::-1]# 初始化一个空列表来存储选择的边界框索引indices = []# 当还有未处理的边界框时,循环继续while len(sorted_indices) > 0:# 选择得分最高的边界框索引current_index = sorted_indices[0]indices.append(current_index)# 如果只剩一个边界框,则结束循环if len(sorted_indices) == 1:break# 获取当前边界框和其他边界框current_box = filtered_boxes[current_index]other_boxes = filtered_boxes[sorted_indices[1:]]# 计算当前边界框与其他边界框的IoUiou = calculate_iou(current_box, other_boxes)# 找到IoU低于阈值的边界框,即与当前边界框不重叠的边界框non_overlapping_indices = np.where(iou <= iou_threshold)[0]# 更新sorted_indices以仅包含不重叠的边界框sorted_indices = sorted_indices[non_overlapping_indices + 1]# 返回选择的边界框索引return indicesdef draw_detections(img, box, score, class_id):"""在输入图像上绘制检测到的对象的边界框和标签。参数:img: 要在其上绘制检测结果的输入图像。box: 检测到的边界框。score: 对应的检测得分。class_id: 检测到的对象的类别ID。返回:无"""# 提取边界框的坐标x1, y1, w, h = box# 根据类别ID检索颜色color = color_palette[class_id]# 在图像上绘制边界框cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)# 创建标签文本,包括类名和得分label = f'{classes[class_id]}: {score:.2f}'# 计算标签文本的尺寸(label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)# 计算标签文本的位置label_x = x1label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10# 绘制填充的矩形作为标签文本的背景cv2.rectangle(img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color, cv2.FILLED)# 在图像上绘制标签文本cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)def preprocess(img, input_width, input_height):"""在执行推理之前预处理输入图像。返回:image_data: 为推理准备好的预处理后的图像数据。"""# 获取输入图像的高度和宽度img_height, img_width = img.shape[:2]# 将图像颜色空间从BGR转换为RGBimg = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# 将图像大小调整为匹配输入形状img = cv2.resize(img, (input_width, input_height))# 通过除以255.0来归一化图像数据image_data = np.array(img) / 255.0# 转置图像,使通道维度为第一维image_data = np.transpose(image_data, (2, 0, 1)) # 通道首# 扩展图像数据的维度以匹配预期的输入形状image_data = np.expand_dims(image_data, axis=0).astype(np.float32)# 返回预处理后的图像数据return image_data, img_height, img_widthdef postprocess(input_image, output, input_width, input_height, img_width, img_height):"""对模型输出进行后处理,提取边界框、得分和类别ID。参数:input_image (numpy.ndarray): 输入图像。output (numpy.ndarray): 模型的输出。input_width (int): 模型输入宽度。input_height (int): 模型输入高度。img_width (int): 原始图像宽度。img_height (int): 原始图像高度。返回:numpy.ndarray: 绘制了检测结果的输入图像。"""# 转置和压缩输出以匹配预期的形状outputs = np.transpose(np.squeeze(output[0]))# 获取输出数组的行数rows = outputs.shape[0]# 用于存储检测的边界框、得分和类别ID的列表boxes = []scores = []class_ids = []# 计算边界框坐标的缩放因子x_factor = img_width / input_widthy_factor = img_height / input_height# 遍历输出数组的每一行for i in range(rows):# 从当前行提取类别得分classes_scores = outputs[i][4:]# 找到类别得分中的最大得分max_score = np.amax(classes_scores)# 如果最大得分高于置信度阈值if max_score >= confidence_thres:# 获取得分最高的类别IDclass_id = np.argmax(classes_scores)# 从当前行提取边界框坐标x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]# 计算边界框的缩放坐标left = int((x - w / 2) * x_factor)top = int((y - h / 2) * y_factor)width = int(w * x_factor)height = int(h * y_factor)# 将类别ID、得分和框坐标添加到各自的列表中class_ids.append(class_id)scores.append(max_score)boxes.append([left, top, width, height])# 应用非最大抑制过滤重叠的边界框indices = custom_NMSBoxes(boxes, scores, confidence_thres, iou_thres)# 遍历非最大抑制后的选定索引for i in indices:# 根据索引获取框、得分和类别IDbox = boxes[i]score = scores[i]class_id = class_ids[i]# 在输入图像上绘制检测结果draw_detections(input_image, box, score, class_id)# 返回修改后的输入图像return input_imagedef init_detect_model(model_path):# 使用ONNX模型文件创建一个推理会话,并指定执行提供者session = ort.InferenceSession(model_path, providers=providers)# 获取模型的输入信息model_inputs = session.get_inputs()# 获取输入的形状,用于后续使用input_shape = model_inputs[0].shape# 从输入形状中提取输入宽度input_width = input_shape[2]# 从输入形状中提取输入高度input_height = input_shape[3]# 返回会话、模型输入信息、输入宽度和输入高度return session, model_inputs, input_width, input_heightdef detect_object(image, session, model_inputs, input_width, input_height):# 如果输入的图像是PIL图像对象,将其转换为NumPy数组if isinstance(image, Image.Image):result_image = np.array(image)else:# 否则,直接使用输入的图像(假定已经是NumPy数组)result_image = image# 预处理图像数据,调整图像大小并可能进行归一化等操作img_data, img_height, img_width = preprocess(result_image, input_width, input_height)# 使用预处理后的图像数据进行推理outputs = session.run(None, {model_inputs[0].name: img_data})# 对推理结果进行后处理,例如解码检测框,过滤低置信度的检测等output_image = postprocess(result_image, outputs, input_width, input_height, img_width, img_height)return output_image
if __name__ == '__main__':model_path = "yolov8n.onnx"session, model_inputs, input_width, input_height = init_detect_model(model_path)image_data = cv2.imread("test.png")result_image = detect_object(image_data, session, model_inputs, input_width, input_height)cv2.imwrite("result_image.png", result_image)cv2.imshow('Output', result_image)cv2.waitKey(0)