map相关方法笔记
mapQR
maqQR创新点:
- 提出了scatter-and-gather query机制,降低query数量,提高收敛速度。
maptr中,每个query是点级别的,比如,预测20条线,每条线有30个点,那n_vec=20,n_pt=30, 总的query 数目为n_vec * n_pt = 600,
这600个query经过decoderlayer 的self-attention和deformable attention时,有很大的计算量。
mapqr则的实现方式为: - maptr再self attn前,会query_ins + query_pt, broadcast操作后,产生n_vec * n_pt 个query,mapqr则继续使用query_ins, 先不做broadcast。 此时query个数为n_vec
- self attn处理n_vec个query。
- deformable attn(InstancePointattention)时,使用scatter-and-gather的scatter操作,将n_vec个query扩展为n_vec * n_pt 个query, 进行deformable attn计算,attention结束时通过gather再
将n_vec *n_pt 个query压缩为n_vec个query。这样就大大减少了self-attn的计算量,提高了收敛速度。
decoder中,每层layer推理前,将ref point做正弦编码,然后用linear embedding,生成每个点的embedding.:
InstancePointattention中:
# in InstancePointattention::__init_self.output_proj = nn.Sequential(nn.Linear(embed_dims*num_pts_per_vec, embed_dims*num_pts_per_vec//2),nn.ReLU(),nn.Linear(embed_dims*num_pts_per_vec//2, embed_dims),nn.ReLU(),nn.Linear(embed_dims, embed_dims),)# in forward, scatter操作。 通过broadcast操作,将每个点的query广播到每个实例上。if pt_query_pos is not None:bs, num_iq, num_pt, _ = pt_query_pos.shapequery = query.unsqueeze(2) + pt_query_posquery = query.reshape(bs, num_iq * num_pt, -1)#...dfa, gather操作output = self.output_proj(output) # --> bs, num_iq, 1
- 创新的位置编码(参考dab detr)
query分为content和position两部分,maptrzhong ,position是learnable的,它生成ref point,在self attn中就给k和q。mapqr则将ref point做正弦编码,然后用linear embedding,生成每个点的embedding.:
再传到dfa中。
for lid, layer in enumerate(self.layers):if self.query_pos_embedding == 'instance':reference_points_reshape = reference_points.view(bs, -1, self.num_pts_per_vec, self.pt_dim)reference_points_reshape = reference_points_reshape.view(bs, -1, self.pt_dim)query_sine_embed = gen_sineembed_for_position(reference_points_reshape[..., :2])query_sine_embed = query_sine_embed.view(bs, -1, self.num_pts_per_vec, self.embed_dims)point_query_pos = self.pt_pos_query_projs[lid](query_sine_embed) #(bs, n_vec, n_pt, embed_dims)query_pos_lid = Nonereference_points_input = reference_points_reshape[..., :2].unsqueeze(2)
- bev encoder中,GKT with flexible height.
encoder中,用gkt方法将每个grid投影到每个相机前,额外增加一个heigh offset的预测,预测每个pillar的高度偏移,然后加到ref point的z坐标上。
HeightKernelAttention中,
# __init__
self.height_offsets = nn.Linear(self.embed_dims, num_points_in_pillar)
# forward:
sampled_heights = self.height_offsets(query) #linear
sampled_heights = sampled_heights / self.height_range
sampled_heights = sampled_heights.permute(0, 2, 1)reference_points[:, :, :, 2] += sampled_heights
reference_points_cam, bev_mask = self.point_sampling(reference_points, self.pc_range, kwargs['img_metas'])