当前位置：首页 > news >正文

map相关方法笔记

news 2025/10/21 11:58:48

mapQR

maqQR创新点：

提出了scatter-and-gather query机制，降低query数量，提高收敛速度。
maptr中，每个query是点级别的，比如，预测20条线，每条线有30个点，那n_vec=20，n_pt=30, 总的query 数目为n_vec * n_pt = 600,
这600个query经过decoderlayer 的self-attention和deformable attention时，有很大的计算量。
mapqr则的实现方式为：
maptr再self attn前，会query_ins + query_pt, broadcast操作后，产生n_vec * n_pt 个query,mapqr则继续使用query_ins, 先不做broadcast。此时query个数为n_vec
self attn处理n_vec个query。
deformable attn(InstancePointattention)时，使用scatter-and-gather的scatter操作，将n_vec个query扩展为n_vec * n_pt 个query, 进行deformable attn计算，attention结束时通过gather再
将n_vec *n_pt 个query压缩为n_vec个query。这样就大大减少了self-attn的计算量，提高了收敛速度。
decoder中，每层layer推理前,将ref point做正弦编码，然后用linear embedding，生成每个点的embedding.：
InstancePointattention中：

# in InstancePointattention::__init_self.output_proj = nn.Sequential(nn.Linear(embed_dims*num_pts_per_vec, embed_dims*num_pts_per_vec//2),nn.ReLU(),nn.Linear(embed_dims*num_pts_per_vec//2, embed_dims),nn.ReLU(),nn.Linear(embed_dims, embed_dims),)# in forward, scatter操作。 通过broadcast操作，将每个点的query广播到每个实例上。if pt_query_pos is not None:bs, num_iq, num_pt, _ = pt_query_pos.shapequery = query.unsqueeze(2) + pt_query_posquery = query.reshape(bs, num_iq * num_pt, -1)#...dfa, gather操作output = self.output_proj(output) # --> bs, num_iq, 1

创新的位置编码（参考dab detr）
query分为content和position两部分，maptrzhong ,position是learnable的，它生成ref point，在self attn中就给k和q。mapqr则将ref point做正弦编码，然后用linear embedding，生成每个点的embedding.：
再传到dfa中。

for lid, layer in enumerate(self.layers):if self.query_pos_embedding == 'instance':reference_points_reshape = reference_points.view(bs, -1, self.num_pts_per_vec, self.pt_dim)reference_points_reshape = reference_points_reshape.view(bs, -1, self.pt_dim)query_sine_embed = gen_sineembed_for_position(reference_points_reshape[..., :2])query_sine_embed = query_sine_embed.view(bs, -1, self.num_pts_per_vec, self.embed_dims)point_query_pos = self.pt_pos_query_projs[lid](query_sine_embed) #(bs, n_vec, n_pt, embed_dims)query_pos_lid = Nonereference_points_input = reference_points_reshape[..., :2].unsqueeze(2)

bev encoder中，GKT with flexible height.
encoder中，用gkt方法将每个grid投影到每个相机前，额外增加一个heigh offset的预测，预测每个pillar的高度偏移，然后加到ref point的z坐标上。
HeightKernelAttention中，

# __init__
self.height_offsets = nn.Linear(self.embed_dims, num_points_in_pillar)
# forward:
sampled_heights = self.height_offsets(query) #linear
sampled_heights = sampled_heights / self.height_range
sampled_heights = sampled_heights.permute(0, 2, 1)reference_points[:, :, :, 2] += sampled_heights
reference_points_cam, bev_mask = self.point_sampling(reference_points, self.pc_range, kwargs['img_metas'])