当前位置：首页 > news >正文

深度学习姿态估计实战：基于ONNX Runtime的YOLOv8 Pose部署全解析

news 来源：原创 2025/6/6 9:08:03

本文将详细介绍如何脱离YOLO官方环境，使用ONNX Runtime部署YOLOv8姿态估计模型。内容包括模型加载、图像预处理（Letterbox缩放和填充）、推理执行、输出解码（边界框和关键点处理）、非极大值抑制（NMS）以及结果可视化。文章还将讨论部署中的性能优化和常见问题。

一，引言

姿态估计是计算机视觉中的一项重要任务，旨在检测图像或视频中人体关键点的位置。YOLOv8 Pose是Ultralytics公司推出的实时姿态估计模型，它将目标检测和关键点估计结合在一个端到端的网络中。为了在各种环境中高效部署该模型，选择使用ONNX Runtime（ORT），它支持跨平台（包括CPU和GPU）推理，且不依赖于原始训练框架。

二，模型加载与初始化

在YOLOv8Pose类的初始化方法中，加载ONNX模型并配置推理会话：

class YOLOv8Pose:def __init__(self, model_path, conf_thres=0.1, iou_thres=0.45):self.conf_thres = conf_thresself.iou_thres = iou_thres# 初始化ONNX Runtimeself.session = ort.InferenceSession(model_path)self.input_name = self.session.get_inputs()[0].nameself.output_name = self.session.get_outputs()[0].nameself.input_shape = self.session.get_inputs()[0].shape[2:]  # (h, w)

注意：

model_path：ONNX模型文件的路径。
conf_thres：置信度阈值，用于过滤低置信度的检测框。
iou_thres：NMS中的IoU阈值。
从模型输入中获取输入形状（高度和宽度），通常为640x640。

三 . 图像预处理：Letterbox缩放与填充

由于模型输入尺寸固定，而输入图像尺寸各异，我们需要将图像调整为模型输入尺寸，同时保持长宽比，以避免扭曲。这通过Letterbox算法实现：

    def preprocess(self, img):# 原始图像尺寸self.orig_h, self.orig_w = img.shape[:2]# 计算缩放比例（取最小比例，使长边缩放到模型输入尺寸，短边按比例缩放）scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)# 计算缩放后的新尺寸self.new_unpad = (int(self.orig_w * scale), int(self.orig_h * scale))# 计算填充（在缩放到模型尺寸后，需要在两侧添加的填充）self.dw = (self.input_shape[1] - self.new_unpad[0]) / 2  # 水平填充self.dh = (self.input_shape[0] - self.new_unpad[1]) / 2  # 垂直填充# 执行缩放if (self.new_unpad[0], self.new_unpad[1]) != (self.orig_w, self.orig_h):img = cv2.resize(img, self.new_unpad, interpolation=cv2.INTER_LINEAR)# 添加填充（上下左右）top, bottom = int(round(self.dh - 0.1)), int(round(self.dh + 0.1))left, right = int(round(self.dw - 0.1)), int(round(self.dw + 0.1))img = cv2.copyMakeBorder(img, top, bottom, left, right,cv2.BORDER_CONSTANT, value=(114, 114, 114))# 图像通道转换和归一化img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # BGR->RGBimg = img.transpose(2, 0, 1)  # HWC->CHWimg = np.ascontiguousarray(img, dtype=np.float32) / 255.0  # 归一化到[0,1]return np.expand_dims(img, axis=0)  # 添加batch维度

四，模型推理

推理过程非常简单，因为我们已经处理好了输入数据：

    # 在main函数中：input_tensor = model.preprocess(img)outputs = model.session.run([model.output_name], {model.input_name: input_tensor})

注意：我们使用session.run进行推理，传入输入数据的字典（输入名称->输入张量）和输出名称列表（这里只需要一个输出）。

五，后处理：解析模型输出

模型输出是一个形状为[1, 11, 8400]的张量（以本文模型为例），其中：

1：批大小（batch size）。
11：每个预测框的维度（4个边界框坐标+1个置信度+6个关键点坐标，因为每个关键点有3个值：x,y,score，所以两个关键点就是6个值）。
8400：预测框的数量。
后处理步骤包括：

转置输出，得到形状为[8400, 11]的矩阵。
根据置信度阈值过滤掉低置信度的预测框。
将边界框格式从(cx, cy, w, h)转换为(x1, y1, x2, y2)。
解析关键点（重塑为[N, 2, 3]，其中2是关键点的数量，每个关键点有x, y, score）。
将坐标从模型输入尺寸映射回原始图像尺寸（反转预处理中的缩放和填充）。
应用非极大值抑制（NMS）去除冗余检测框。

    def postprocess(self, outputs):predictions = outputs[0][0].T  # 转置为[8400, 11]# 1. 按置信度阈值过滤conf_mask = predictions[:, 4] > self.conf_threspredictions = predictions[conf_mask]if predictions.shape[0] == 0:return [], [], []   # 没有检测结果# 2. 边界框转换 (cx, cy, w, h) -> (x1, y1, x2, y2)boxes = predictions[:, :4].copy()boxes[:, 0] = boxes[:, 0] - boxes[:, 2] / 2  # x1 = cx - w/2boxes[:, 1] = boxes[:, 1] - boxes[:, 3] / 2  # y1 = cy - h/2boxes[:, 2] = boxes[:, 0] + boxes[:, 2]      # x2 = x1 + wboxes[:, 3] = boxes[:, 1] + boxes[:, 3]      # y2 = y1 + h# 3. 关键点：将6个值（2个关键点）重塑为[2, 3]keypoints = predictions[:, 5:].reshape(-1, 2, 3)  # [n, 2, 3]# 4. 坐标转换（映射回原始图像尺寸）# 计算缩放比例scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)# 调整边界框boxes[:, [0, 2]] -= self.dw  # 减去水平填充boxes[:, [1, 3]] -= self.dh   # 减去垂直填充boxes[:, :4] /= scale         # 缩放到原始图像尺寸# 调整关键点keypoints[:, :, 0] -= self.dw   # 关键点x坐标减去水平填充keypoints[:, :, 1] -= self.dh   # 关键点y坐标减去垂直填充keypoints[:, :, :2] /= scale    # 缩放到原始图像尺寸# 取整boxes = boxes.round().astype(int)keypoints = keypoints.round().astype(int)# 5. NMSscores = predictions[:, 4]indices = cv2.dnn.NMSBoxes(boxes.tolist(), scores.tolist(), self.conf_thres, self.iou_thres)# 注意：如果indices为空，则返回空列表；否则使用索引获取元素if len(indices) > 0:indices = indices.flatten()return boxes[indices], scores[indices], keypoints[indices]else:return [], [], []

注意：

在坐标转换时，我们先减去填充（dw和dh），然后除以缩放比例scale。
使用round().astype(int)将坐标转为整数。
使用OpenCV的NMSBoxes函数进行非极大值抑制，该函数返回保留框的索引。

六，结果可视化

可视化函数在图像上绘制边界框和关键点：

    def visualize(self, image, boxes, keypoints):# 绘制边界框for box in boxes:x1, y1, x2, y2 = boxcv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)# 绘制关键点及连线for kpts in keypoints:# 绘制每个关键点（两个关键点，第一个为红色，第二个为蓝色）for i, (x, y, score) in enumerate(kpts):if score > 0.5:  # 关键点置信度阈值color = (0, 0, 255) if i == 0 else (255, 0, 0)cv2.circle(image, (x, y), 5, color, -1)# 连接两个关键点（如果两个关键点都置信度高）if len(kpts) == 2 and all(kpts[:, 2] > 0.5):x1, y1, _ = kpts[0]x2, y2, _ = kpts[1]cv2.line(image, (x1, y1), (x2, y2), (0, 255, 255), 2)return image

说明：

边界框为绿色矩形。
第一个关键点（索引0）绘制为红色点，第二个关键点（索引1）为蓝色点。
如果两个关键点的置信度都大于0.5，则在它们之间绘制一条黄色连线。

七，主函数流程

if __name__ == "__main__":model_path = "./runs/pose/train16/weights/best.onnx"image_path = "./input/test.png"# 读取图像img = cv2.imread(image_path)if img is None:raise ValueError(f"Error: Unable to read image from {image_path}")# 创建模型实例model = YOLOv8Pose(model_path)# 预处理input_tensor = model.preprocess(img)# 推理outputs = model.session.run([model.output_name], {model.input_name: input_tensor})# 后处理boxes, scores, keypoints = model.postprocess(outputs)# 可视化result = model.visualize(img.copy(), boxes, keypoints)cv2.imshow("Result", result)cv2.waitKey(0)cv2.destroyAllWindows()

八，完整代码如下

import cv2
import numpy as np
import onnxruntime as ortclass YOLOv8Pose:def __init__(self, model_path, conf_thres=0.1, iou_thres=0.45):self.conf_thres = conf_thresself.iou_thres = iou_thres# 初始化ONNX Runtimeself.session = ort.InferenceSession(model_path)self.input_name = self.session.get_inputs()[0].nameself.output_name = self.session.get_outputs()[0].nameself.input_shape = self.session.get_inputs()[0].shape[2:]  # (h, w)def preprocess(self, img):# Letterbox处理（保持宽高比）self.orig_h, self.orig_w = img.shape[:2]scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)# 计算新尺寸和填充self.new_unpad = (int(self.orig_w * scale), int(self.orig_h * scale))self.dw = (self.input_shape[1] - self.new_unpad[0]) / 2  # 水平填充self.dh = (self.input_shape[0] - self.new_unpad[1]) / 2  # 垂直填充# 执行缩放和填充if (self.new_unpad[0], self.new_unpad[1]) != (self.orig_w, self.orig_h):img = cv2.resize(img, self.new_unpad, interpolation=cv2.INTER_LINEAR)top, bottom = int(round(self.dh - 0.1)), int(round(self.dh + 0.1))left, right = int(round(self.dw - 0.1)), int(round(self.dw + 0.1))img = cv2.copyMakeBorder(img, top, bottom, left, right,cv2.BORDER_CONSTANT, value=(114, 114, 114))# 转换颜色通道和维度img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)img = img.transpose(2, 0, 1)  # HWC -> CHWimg = np.ascontiguousarray(img, dtype=np.float32) / 255.0return np.expand_dims(img, axis=0)def postprocess(self, outputs):# 输出形状转换 [1, 11, 8400] -> [8400, 11] predictions = outputs[0][0].T# 过滤低置信度conf_mask = predictions[:, 4] > self.conf_threspredictions = predictions[conf_mask]if predictions.shape[0] == 0:return [], [], []# 转换边界框坐标 (cx, cy, w, h) -> (x1, y1, x2, y2)boxes = predictions[:, :4].copy()boxes[:, 0] = (boxes[:, 0] - boxes[:, 2] / 2)  # x1boxes[:, 1] = (boxes[:, 1] - boxes[:, 3] / 2)  # y1boxes[:, 2] += boxes[:, 0]  # x2boxes[:, 3] += boxes[:, 1]  # y2# 关键点处理 (每个目标有两个关键点，每个点含x,y,score)keypoints = predictions[:, 5:].reshape(-1, 2, 3)  # [N, 2, 3]# 坐标转换到原始图像空间scale = min(self.input_shape[0] / self.orig_h, self.input_shape[1] / self.orig_w)# 调整边界框boxes[:, [0, 2]] -= self.dw  # 减去水平填充boxes[:, [1, 3]] -= self.dh  # 减去垂直填充boxes /= scaleboxes = boxes.round().astype(int)# 调整关键点keypoints[:, :, 0] -= self.dwkeypoints[:, :, 1] -= self.dhkeypoints[:, :, :2] /= scalekeypoints = keypoints.round().astype(int)# 应用NMSscores = predictions[:, 4]indices = self.nms(boxes, scores)return boxes[indices], scores[indices], keypoints[indices]def nms(self, boxes, scores):# OpenCV实现的高效NMSreturn cv2.dnn.NMSBoxes(boxes.tolist(),scores.tolist(),self.conf_thres,self.iou_thres)def visualize(self, image, boxes, keypoints):# 绘制边界框for box in boxes:x1, y1, x2, y2 = boxcv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)# 绘制关键点及连线for kpts in keypoints:# 绘制关键点for i, (x, y, score) in enumerate(kpts):if score > 0.5:color = (0, 0, 255) if i == 0 else (255, 0, 0)cv2.circle(image, (x, y), 5, color, -1)# 绘制两个关键点之间的连线if len(kpts) == 2 and all(kpts[:, 2] > 0.5):x1, y1 = kpts[0][:2]x2, y2 = kpts[1][:2]cv2.line(image, (x1, y1), (x2, y2), (0, 255, 255), 2)return imageif __name__ == "__main__":model_path = "./runs/pose/train16/weights/best.onnx"image_path = "./input/test.png"# 读取图像img = cv2.imread(image_path)if img is None:raise ValueError(f"Error: Unable to read image from {image_path}")# 创建YOLOv8Pose实例model = YOLOv8Pose(model_path)# 预处理input_tensor = model.preprocess(img)# 推理outputs = model.session.run([model.output_name], {model.input_name: input_tensor})# 后处理boxes, scores, keypoints = model.postprocess(outputs)# 可视化result = model.visualize(img.copy(), boxes, keypoints)cv2.imshow("Result", result)cv2.waitKey(0)cv2.destroyAllWindows()