当前位置：首页 > news >正文

深度学习篇---松科TPU部署代码分析

news 2025/11/4 12:04:00

接口代码功能概述

以下代码提供了一个深度学习推理引擎的 Python 接口，主要包含两个核心组件：图像加载函数load_image和推理引擎类InferEngine。整体功能是加载图像数据并通过指定模型进行推理计算。

关键组件分析

1. 图像加载函数 `load_image`

def load_image(file, resize=None, rgb_en=True, gray_en=False):# 内部函数处理PIL图像加载和预处理def load_image_(file, resize=None, rgb_en=True, gray_en=False):img = Image.open(file)if resize is not None:w, h = resizeimg = img.resize((w, h))if gray_en:img = img.convert('L')return img# 执行图像加载img = load_image_(file, resize, rgb_en, gray_en)mode = img.mode# 转换为NumPy数组并调整维度img = np.array(img)if len(img.shape) == 2:img = np.expand_dims(img, -1)# 颜色空间转换 (BGR→RGB)if not gray_en and rgb_en and mode == 'BGR':img = img[:, :, ::-1]# 增加批次维度并调整为BCHW格式img = img[np.newaxis, :, :, :]  # 添加批次维度img = np.swapaxes(img, 1, 3)    # 调整为BCHW格式img = np.swapaxes(img, 2, 3)return img

功能特点：

支持图像缩放、灰度转换等预处理操作
自动处理单通道图像的维度扩展
支持 BGR 到 RGB 的颜色空间转换
最终输出符合深度学习模型输入要求的 BCHW 格式张量

2. 推理引擎类 `InferEngine`

初始化方法：

def __init__(self, net_bin_path, model_bin_path, dev_id = 0, input_num = 1, max_batch = 1):self.dev_id_ = dev_idself.input_num_ = input_numself.t_ = [0 for i in range(7)]  # 用于性能分析的时间点记录# 记录时间点self.t_[0] = datetime.datetime.now()# self.handle_ = de.get_global_func("de.load_library")(dev_id, "host:/DEngine/desdk/platform/dev_linux-dp1000/lib/libinfer_engine.so")self.t_[1] = datetime.datetime.now()# 创建推理引擎实例self.engine_ = de.get_global_func("de.infer_engine.create")(net_bin_path, model_bin_path, dev_id, max_batch)self.t_[2] = datetime.datetime.now()

预测方法：

def predict(self, *args):# 记录时间点self.t_[3] = datetime.datetime.now()nd_in = []self.batch_num_ = 0# 处理输入数据for nps in args:  # 遍历批次input_num = 0for np in nps:  # 遍历每个输入张量nd = self.to_ndarray(np[0], np[1], np[2])nd_in.append(nd)input_num += 1if input_num != self.input_num_:print(f"[ERROR] input num must be {self.input_num_}, current is {input_num}!")self.batch_num_ += 1self.t_[4] = datetime.datetime.now()# 执行推理tensor_num = de.get_global_func("de.infer_engine.predict")(self.engine_, self.input_num_, self.batch_num_, *nd_in)self.t_[5] = datetime.datetime.now()# 获取输出结果data_out = []for i in range(tensor_num):data_out.append(de.get_global_func("de.infer_engine.get_output")(self.engine_, i).asnumpy())self.t_[6] = datetime.datetime.now()return data_out

性能分析方法：

def profile(self):print("")print("profile:")print(f"[{self.t_[1]}] load library cost {(self.t_[1]-self.t_[0]).seconds*1000 + (self.t_[1]-self.t_[0]).microseconds/1000} ms")print(f"[{self.t_[2]}] engine create cost {(self.t_[2]-self.t_[1]).seconds*1000 + (self.t_[2]-self.t_[1]).microseconds/1000} ms")print(f"[{self.t_[4]}] data to ndarray cost {(self.t_[4]-self.t_[3]).seconds*1000 + (self.t_[4]-self.t_[3]).microseconds/1000} ms")print(f"[{self.t_[5]}] predict batch={self.batch_num_} cost {(self.t_[5]-self.t_[4]).seconds*1000 + (self.t_[5]-self.t_[4]).microseconds/1000} ms")print(f"[{self.t_[6]}] get output cost {(self.t_[6]-self.t_[5]).seconds*1000 + (self.t_[6]-self.t_[5]).microseconds/1000} ms")

优化建议

添加异常处理：在关键操作处添加 try-except 块，增强代码健壮性
完善日志系统：使用 Python 的 logging 模块替代简单的 print 语句
优化内存使用：在处理大数据量时，考虑使用生成器或内存映射文件
增强输入验证：添加更全面的输入数据验证逻辑
性能优化：考虑使用多线程或异步处理提高吞吐量

总结

这段代码提供了一个完整的深度学习推理流程，包括图像预处理和模型推理两部分。通过合理的设计和优化，可以进一步提高其性能和稳定性，适用于实际的深度学习应用场景。

推理代码功能概述

下面这段代码实现了一个基于 YOLOv5 模型的实时人脸口罩检测系统。它通过摄像头捕获视频流，对每一帧图像进行处理，检测人脸和口罩，并在画面上标注出检测结果。系统使用了自定义的推理引擎InferEngine来执行模型推理，并结合了计算机视觉技术进行后处理。

核心组件与功能

1. 坐标转换与 NMS 算法

def xywh2xyxy(x):# 将边界框格式从 [x, y, w, h] 转换为 [x1, y1, x2, y2]y = np.copy(x)y[:, 0] = x[:, 0] - x[:, 2] / 2  # x1 = x - w/2y[:, 1] = x[:, 1] - x[:, 3] / 2  # y1 = y - h/2y[:, 2] = x[:, 0] + x[:, 2] / 2  # x2 = x + w/2y[:, 3] = x[:, 1] + x[:, 3] / 2  # y2 = y + h/2return ydef py_cpu_nms(dets, thresh):# 实现非极大值抑制算法，过滤重叠的边界框x = dets[:, 0]y = dets[:, 1]w = dets[:, 2]h = dets[:, 3]scores = dets[:, 4]# 计算边界框面积x1 = x - w / 2 + 1y1 = y - h / 2 + 1x2 = x + w / 2y2 = y + w / 2areas = (x2 - x1 + 1) * (y2 - y1 + 1)# 按置信度排序index = scores.argsort()[::-1]res = []# 迭代筛选边界框while index.size > 0:i = index[0]res.append(i)# 计算IoUx11 = np.maximum(x1[i], x1[index[1:]])y11 = np.maximum(y1[i], y1[index[1:]])x22 = np.minimum(x2[i], x2[index[1:]])y22 = np.minimum(y2[i], y2[index[1:]])w = np.maximum(0, x22 - x11 + 1)h = np.maximum(0, y22 - y11 + 1)overlaps = w * hiou = overlaps / (areas[i] + areas[index[1:]] - overlaps)# 保留IoU小于阈值的边界框idx = np.where(iou <= thresh)[0]index = index[idx + 1]return np.array(res, dtype=np.int32)

2. 边界框过滤函数

def filter_box(org_box, conf_thres, iou_thres):# 过滤置信度低和重叠的边界框org_box = np.squeeze(org_box)  # 删除维度为1的轴# 筛选置信度大于阈值的边界框conf = org_box[..., 4] > conf_thresbox = org_box[conf == True]# 获取每个边界框的类别cls_cinf = box[..., 5:]cls = []for i in range(len(cls_cinf)):cls.append(int(np.argmax(cls_cinf[i])))all_cls = list(set(cls))# 按类别分别进行处理output = []for i in range(len(all_cls)):curr_cls = all_cls[i]curr_cls_box = []# 提取当前类别的边界框for j in range(len(cls)):if cls[j] == curr_cls:box[j][5] = curr_clscurr_cls_box.append(box[j][:6])curr_cls_box = np.array(curr_cls_box)curr_cls_box = xywh2xyxy(curr_cls_box)  # 转换坐标格式# 应用非极大值抑制curr_out_box = py_cpu_nms(curr_cls_box, iou_thres)for k in curr_out_box:output.append(curr_cls_box[k])return np.array(output)

3. 主程序流程

if __name__ == "__main__":# 初始化推理引擎print("InferEngine example start...")net_file = "/DEngine/model/mask/net.bin"model_file = "/DEngine/model/mask/model.bin"engine = InferEngine(net_file, model_file, max_batch=1)# 打开摄像头cap = cv2.VideoCapture(0)format = de.PixelFormat.DE_PIX_FMT_RGB888_PLANEfps = 0l_used_time = []# 主循环：持续捕获和处理视频帧while True:s = time.time()ret, frame = cap.read()# 图像预处理frame = cv2.resize(frame, (416, 416))img = frame[:, :, ::-1].transpose(2, 0, 1)  # BGR2RGB和HWC2CHWimg = np.expand_dims(img, axis=0)  # 添加批次维度# 模型推理shape = (1, 3, 416, 416)data = [(format, shape, img)]output = engine.predict(data)# 后处理：过滤边界框output = output[0]outbox = filter_box(output, 0.5, 0.45)CLASSES = ('face', 'mask')# 在图像上绘制边界框和标签for o in outbox:x1 = int(o[0])y1 = int(o[1])x2 = int(o[2])y2 = int(o[3])score = o[4]classes = int(o[5])# 绘制边界框和标签cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)text = "%s:%.2f" % (CLASSES[classes], score)cv2.putText(frame, text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 255), 2)# 计算并显示FPScv2.putText(frame, text='FPS: {}'.format(fps), org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.50, color=(255, 0, 0), thickness=2)# 显示结果cv2.imshow("frame", frame)# 计算FPSused_time = time.time() - sl_used_time.append(used_time)if len(l_used_time) > 20:l_used_time.pop(0)fps = int(1 / np.mean(l_used_time))# 按'q'键退出if cv2.waitKey(1) & 0xFF == ord('q'):break# 释放资源cap.release()cv2.destroyAllWindows()