当前位置: 首页 > news >正文

BEV:隐式相机视角转换-----BEVFormer

一、背景

基于imp投影的相机视角转换,对相机的内外参依赖较高,BEV 网格融合固定,可能对小目标不够敏感;考虑通过transformer的方式进行相机的视角转换,BEV query 可以自适应关注关键区域,提高小目标检测,transformer 注意力机制,灵活采样。故通过BEVFormer的demo代码理解其原理。

二、代码

import torch
import torch.nn as nn# -------------------------
# 参数
# -------------------------
B, C, H, W = 2, 64, 16, 16        # 摄像头特征图
bev_H, bev_W = 8, 8                # BEV 网格
num_cameras = 7
num_classes = 10
num_det_queries = 32                # detection query 数量# -------------------------
# 1. 多摄像头特征
# -------------------------
camera_feats = [torch.randn(B, C, H*W) for _ in range(num_cameras)]  # B x C x N (N=H*W)
for i in range(num_cameras):camera_feats[i] = camera_feats[i].permute(0, 2, 1)  # B x N x C# -------------------------
# 2. BEV query + Transformer 投影
# -------------------------
num_bev_queries = bev_H * bev_W
bev_queries = nn.Parameter(torch.randn(num_bev_queries, B, C))class BEVProjectionTransformer(nn.Module):def __init__(self, C, num_heads=8):super().__init__()self.attn = nn.MultiheadAttention(embed_dim=C, num_heads=num_heads)def forward(self, bev_queries, camera_feats):"""bev_queries: num_bev_queries x B x Ccamera_feats: list of B x N x C"""# 拼接所有摄像头特征feats = torch.cat(camera_feats, dim=1)      # B x (num_cameras*N) x Cfeats = feats.permute(1,0,2)               # (num_cameras*N) x B x Cbev_out, _ = self.attn(bev_queries, feats, feats)return bev_outbev_proj_transformer = BEVProjectionTransformer(C)
bev_features = bev_proj_transformer(bev_queries, camera_feats)
bev_features_grid = bev_features.permute(1,0,2).reshape(B, bev_H, bev_W, C)# -------------------------
# 3. Detection query + Transformer
# -------------------------
det_queries = nn.Parameter(torch.randn(num_det_queries, B, C))class DetectionDecoderTransformer(nn.Module):def __init__(self, C, num_heads=8):super().__init__()self.attn = nn.MultiheadAttention(embed_dim=C, num_heads=num_heads)def forward(self, det_queries, bev_features_grid):B, H, W, C = bev_features_grid.shapebev_flat = bev_features_grid.reshape(B, H*W, C).permute(1,0,2)out, _ = self.attn(det_queries, bev_flat, bev_flat)return outdecoder = DetectionDecoderTransformer(C)
det_features = decoder(det_queries, bev_features_grid)# -------------------------
# 4. Detection head
# -------------------------
class SimpleDetectionHead(nn.Module):def __init__(self, C, num_classes):super().__init__()self.cls_head = nn.Linear(C, num_classes)self.bbox_head = nn.Linear(C, 7)def forward(self, det_features):cls_logits = self.cls_head(det_features)bbox_preds = self.bbox_head(det_features)return cls_logits, bbox_predsdetection_head = SimpleDetectionHead(C, num_classes)
cls_logits, bbox_preds = detection_head(det_features)print("类别 logits shape:", cls_logits.shape)     # num_det_queries x B x num_classes
print("3D bbox preds shape:", bbox_preds.shape)   # num_det_queries x B x 7
http://www.dtcms.com/a/339038.html

相关文章:

  • 简单实现监听redis的Key过期事件
  • Shopee本土店账号安全运营:规避封禁风险的多维策略
  • 微服务-08.微服务拆分-拆分商品服务
  • 什么是强化学习
  • JMeter高级性能测试训练营 – 从入门到企业级实战
  • pytest高级用法之插件开发
  • Quartus Prime 18.1网盘资源下载与安装指南
  • 从线性回归到神经网络到自注意力机制 —— 激活函数与参数的演进
  • Berry Material React TypeScript 管理后台使用教程 v0.1.0
  • 手写C++ string类实现详解
  • React 新拟态登录页面使用教程
  • 星图云开发者平台新功能速递 | 微服务管理器:无缝整合异构服务,释放云原生开发潜能
  • C++入门自学Day14-- Stack和Queue的自实现(适配器)
  • [Android] 显示的内容被导航栏这挡住
  • STM32 定时器(输出模式)
  • 开源游戏引擎Bevy 和 Godot
  • 开源AI工具Midscene.js
  • 第9章 React与TypeScript
  • 日语学习-日语知识点小记-构建基础-JLPT-N3阶段(17):文法+单词第5回3-复习
  • 指针的应用学习日记
  • GraphQL 与 REST 在微服务架构中的对比与设计实践
  • RadioIrqProcess函数详细分析与流程图
  • C#语言的语法(数据类型)
  • 清空 github 仓库的历史提交记录(创建新分支)
  • 神经网络中的那些关键设计:从输入输出到参数更新
  • STranslate:一键聚合翻译+OCR,效率翻倍
  • 云端赋能,智慧运维:分布式光伏电站一体化监控平台研究
  • 卫生许可证识别技术:通过OCR与NLP实现高效合规管理,提升审核准确性与效率
  • Git#revert
  • 如何解析PDF中的复杂表格数据