当前位置：首页 > news >正文

Faster RCNN - RPN作用原理

news 2025/10/6 6:10:48

概述

以mmdet中的mmdet/configs/_base_/models/faster_rcnn_r50_fpn.py实现为参考，介绍其中RPN的工作原理。

RPN的网络结构：shared conv * 1 + 2 conv heads 相当于对feat map的每个位置进行了cls和bbox预测。

RPN forward流程：

先对每个lvl/stage单独得到cls和bbox outs (B, 3, H, W), (B, 12, H, W)。因为每个位置对应3个ratio的anchors，所以cls维度是3，bbox是4*3；
生成每个stage的anchors；
给每个anchors分配正负标签，并采样正负样本，用于计算RPN loss；
从RPN的outs中做top-k筛选，并经过NMS后处理，得到ROIHead的输入proposals。proposals的requires_grad=False

anchors生成原理

对应类mmdet.models.task_modules.prior_generators.anchor_generator.AnchorGenerator。

输入：（1）featmap_sizes：list of [H, W] of 5 lvl. （2）batch_img_metas

forward前预先生成base anchors：

在forward前，会对每个lvl会预定义一组base_anchors，后续只是将base_anchors沿featmap每个位置移动而已。
base_anchors生成涉及参数：（1）base_sizes（例如[4, 8, 16, 32, 64]），代表每个lvl下的base anchor的初始尺寸宽高，通常等于strides；（2）scales（[8]），base anchor的缩放系数；（3）ratios（[0.5, 1.0, 2.0]），base anchor的宽高比。
base_anchors生成过程：base_w * w_ratios * scales。
base_anchors 为 {list:n_lvl} - [3, 4] 3个ratios，XYXY。

得到base_anchors后通过平移生成anchor grid：

首先生成x和y方向上的每个位置的平移距离（对应原始图像的像素距离）：torch.arange(0, feat_w) * stride_w。
通过_meshgrid获得平移grid。
最终网格：base_anchors + grid_shifts，格式[n_anchors, 4]。

计算所有anchors的valid_flag：

计算padding后实际有图像的featmap高和宽valid_feat_h。
meshgrid，并repeat 3次得到每个ratio的flag

计算rpn loss - 给anchors分配正负标签

计算rpn loss时，需要先根据和GT的iou给每个anchor分配正负样本标签。对应类MaxIoUAssigner。输入：（1）anchor_list {list:bs} - {list:lvl} - (n_anchors, 4)；（2）valid_flag_list；（3）batch_gt_instances.

将每个lvl的anchors concat一起，得到{list:bs} - (n_anchors, 4)；
MaxIoUAssigner：负样本分配cls_idx=-1，正样本分配cls_idx=0；按以下顺序
assign every bbox to the background；
assign proposals whose iou with all gts < neg_iou_thr to gt_id=0；
for each bbox, if the iou with its nearest gt >= pos_iou_thr, assign it to that bbox；
for each gt bbox, assign its nearest proposals (may be more than one) to itself。避免gt没有任何对应的bbox。