当前位置：首页 > news >正文

Senna代码解读

news 来源：原创 2025/5/30 10:53:57

梳理下函数：senna_nusc_data_coonverter.py这个函数的逻辑框架

nuscenes_data_prep->create_nuscenes_infos->_fill_trainval_infos->generate_drive_qa

文章目录

nuscenes_data_prep
create_nuscenes_infos
_fill_trainval_infos
generate_drive_qa
- get_vru_qa（VRU（行人、自行车等弱势道路使用者）问答
- get_mot_pred_qa（不同摄像头视角的运动预测问答
- get_traffic_light_qa（交通信号灯问答
- get_img_description_qa（各摄像头图像描述问答
- get_plan_qa（规划相关问答
- get_plan_explaination_qa（规划解释问答
eval_llava_34b_wo_init
eval_multi_img_model_wo_init
训练和测试指令
数据集

nuscenes_data_prep

create_nuscenes_infos

_fill_trainval_infos

一个用于生成 nuScenes 数据集训练与验证信息的函数 _fill_trainval_infos，这个函数很长，集成了大量与传感器信息、坐标变换、车辆状态、未来轨迹预测等相关的逻辑。
从 nuScenes 原始数据中提取并组织训练/验证所需的结构化信息，生成训练样本。

generate_drive_qa

用于根据当前场景信息 info 生成一组驾驶相关的多模态问答对（QA），辅助多模态模型理解驾驶环境。

get_vru_qa（VRU（行人、自行车等弱势道路使用者）问答

get_mot_pred_qa（不同摄像头视角的运动预测问答

get_traffic_light_qa（交通信号灯问答

get_img_description_qa（各摄像头图像描述问答

get_plan_qa（规划相关问答

功能是基于车辆未来轨迹信息，生成一个关于未来三秒驾驶计划的问答对
question = (
f"你当前的车速是 {int(ego_cur_vel)} 米每秒，"
f"导航指令是 ‘{ego_navi_cmd}’。"
“请结合你对当前驾驶环境和导航信息的理解，”
“说明你未来三秒内的驾驶计划。”
“请分别回答你的速度计划（SPEED）和路径计划（PATH）。”
“速度计划包括：保持（KEEP）、加速（ACCELERATE）、减速（DECELERATE）和停车（STOP）。”
“路径计划包括：直行（STRAIGHT）、右变道（RIGHT_CHANGE）、左变道（LEFT_CHANGE）、右转（RIGHT_TURN）和左转（LEFT_TURN）。”
“例如，正确的回答格式为：“保持，左变道”。”
)

def get_plan_qa(info):if np.any(info['gt_ego_fut_masks'] == 0):return Noneego_cur_vel = info['gt_ego_lcf_feat'][7]ego_navi_cmd = info['ego_navi_cmd']question = f"Your current speed is {int(ego_cur_vel)} m/s, " \f"the navigation command is '{ego_navi_cmd}', " \"based on the understanding of the driving scene and the navigation information, " \"what is your plan for the next three seconds? " \"Please answer your SPEED plan and your PATH plan. " \"SPEED includes KEEP, ACCELERATE and DECELERATE, and STOP, " \"PATH includes STRAIGHT, RIGHT_CHANGE, LEFT_CHANGE, RIGHT_TURN, LEFT_TURN. " \"For example, a correct answer format is like 'KEEP, LEFT_CHANGE'."ego_fut_traj = np.cumsum(info['gt_ego_fut_trajs'], axis=-2)ego_pedal_status = get_obj_acc_or_dec(ego_fut_traj)ego_speed_plan = pedal_status[ego_pedal_status]ego_path_plan = get_obj_turn_or_lane_change(ego_fut_traj)ego_path_plan = path_status[ego_path_plan]answer = ego_speed_plan + ', ' + ego_path_plan + '\n'qa = format_qa(question, answer)return qa

get_plan_explaination_qa（规划解释问答

get_plan_explaination_qa 是用来生成对驾驶决策的解释型问答（QA）对的，结合了当前速度、导航命令、未来轨迹等信息，并调用模型对当前环境图像进行理解，生成对决策原因的解释
描述当前速度、导航命令、驾驶决策。

要求模型基于当前环境图片解释驾驶决策原因。

def get_plan_explaination_qa(info,tokenizer,model,image_processor):cf_img_name = info['images'][0]ego_cur_vel = info['gt_ego_lcf_feat'][7]ego_navi_cmd = info['ego_navi_cmd']ego_fut_traj = np.cumsum(info['gt_ego_fut_trajs'], axis=-2)ego_pedal_status = get_obj_acc_or_dec(ego_fut_traj)ego_speed_plan = pedal_status[ego_pedal_status]ego_path_plan = get_obj_turn_or_lane_change(ego_fut_traj)ego_path_plan = path_status[ego_path_plan]pedal_decision = {'KEEP': 'maintain the current speed','ACCELERATE': 'accelerate','DECELERATE': 'decelerate','STOP': 'stop the car'}path_decision = {'RIGHT_TURN': 'turn right','RIGHT_CHANGE': 'change to the right lane','LEFT_TURN': 'turn left','LEFT_CHANGE': 'change to the left lane','STRAIGHT': 'go straight'}if ego_speed_plan == 'STOP':decision = pedal_decision[ego_speed_plan]else:decision = pedal_decision[ego_speed_plan] + ' and ' + path_decision[ego_path_plan]question = "You are driving, " \f"your current speed is {int(ego_cur_vel)} m/s, " \f"and the navigation command is '{ego_navi_cmd}', " \"your driving decision for the next three seconds is to " \f"{decision}. " \"Based on the provided image of the driving environment, " \"explain the most likely reason for this decision in one or two concise sentence." \args = type('Args', (), {"query": question,"img_file": cf_img_name,})()answer = eval_llava_34b_wo_init(args, tokenizer, model, image_processor)qa = format_qa(question, answer)return qa

eval_llava_34b_wo_init

主要用于图像和文本生成任务，例如视觉问答（VQA）或图像描述生成
enna_nusc_data_converter.py的50行 eval_llava_34b_wo_init

eval_multi_img_model_wo_init

位于data_tools/seena_qa_utils.py中
answer = eval_multi_img_model_wo_init(args, tokenizer, model, image_processor)
处理多个图像输入和一个查询，并生成基于图像的文本回答。它首先处理输入的图像和查询，构建对话输入，然后将其传递给生成模型，最终解码模型输出并返回生成的文本。这个函数主要用于图像和文本生成任务，如视觉问答（VQA）或图像描述生成
构建问题：

 images = load_images(args.image_file)  # discard fisheye imageimage_sizes = [x.size for x in images]images_tensor = process_images(images,image_processor,model.config).to(model.device, dtype=torch.float16)images_tensor = images_tensor.unsqueeze(0)input_ids = (tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).cuda())

生成答案

 output_ids = model.generate(input_ids,images=images_tensor,image_sizes=image_sizes,do_sample=True if args.temperature > 0 else False,temperature=args.temperature,top_p=args.top_p,num_beams=args.num_beams,max_new_tokens=args.max_new_tokens,use_cache=True,)

训练和测试指令

在SSR_Head中的class MLN中： train的时候72，test的时候12
训练

/home/gaojing/anaconda3/envs/ssr/bin/python -m torch.distributed.run --nproc_per_node=3 --master_port=2333 tools/train.py projects/configs/SSR/SSR_e2e.py --launcher pytorch --deterministic --work-dir outputs --resume-from outputs/latest.pth

测试

CUDA_VISIBLE_DEVICES=0 python tools/test.py projects/configs/SSR/SSR_e2e.py /data/Jozky/SSR/outputs_lora/latest.pth --launcher none --eval bbox --tmpdir tmp