Senna代码解读
梳理下函数:senna_nusc_data_coonverter.py这个函数的逻辑框架
nuscenes_data_prep->create_nuscenes_infos->_fill_trainval_infos->generate_drive_qa
文章目录
- nuscenes_data_prep
- create_nuscenes_infos
- _fill_trainval_infos
- generate_drive_qa
- get_vru_qa(VRU(行人、自行车等弱势道路使用者)问答
- get_mot_pred_qa(不同摄像头视角的运动预测问答
- get_traffic_light_qa(交通信号灯问答
- get_img_description_qa(各摄像头图像描述问答
- get_plan_qa(规划相关问答
- get_plan_explaination_qa(规划解释问答
- eval_llava_34b_wo_init
- eval_multi_img_model_wo_init
- 训练和测试指令
- 数据集
nuscenes_data_prep
create_nuscenes_infos
_fill_trainval_infos
一个用于生成 nuScenes 数据集训练与验证信息的函数 _fill_trainval_infos,这个函数很长,集成了大量与传感器信息、坐标变换、车辆状态、未来轨迹预测等相关的逻辑。
从 nuScenes 原始数据中提取并组织训练/验证所需的结构化信息,生成训练样本。
generate_drive_qa
用于根据当前场景信息 info 生成一组驾驶相关的多模态问答对(QA),辅助多模态模型理解驾驶环境。
get_vru_qa(VRU(行人、自行车等弱势道路使用者)问答
get_mot_pred_qa(不同摄像头视角的运动预测问答
get_traffic_light_qa(交通信号灯问答
get_img_description_qa(各摄像头图像描述问答
get_plan_qa(规划相关问答
功能是基于车辆未来轨迹信息,生成一个关于未来三秒驾驶计划的问答对
question = (
f"你当前的车速是 {int(ego_cur_vel)} 米每秒,"
f"导航指令是 ‘{ego_navi_cmd}’。"
“请结合你对当前驾驶环境和导航信息的理解,”
“说明你未来三秒内的驾驶计划。”
“请分别回答你的速度计划(SPEED)和路径计划(PATH)。”
“速度计划包括:保持(KEEP)、加速(ACCELERATE)、减速(DECELERATE)和停车(STOP)。”
“路径计划包括:直行(STRAIGHT)、右变道(RIGHT_CHANGE)、左变道(LEFT_CHANGE)、右转(RIGHT_TURN)和左转(LEFT_TURN)。”
“例如,正确的回答格式为:“保持,左变道”。”
)
def get_plan_qa(info):if np.any(info['gt_ego_fut_masks'] == 0):return Noneego_cur_vel = info['gt_ego_lcf_feat'][7]ego_navi_cmd = info['ego_navi_cmd']question = f"Your current speed is {int(ego_cur_vel)} m/s, " \f"the navigation command is '{ego_navi_cmd}', " \"based on the understanding of the driving scene and the navigation information, " \"what is your plan for the next three seconds? " \"Please answer your SPEED plan and your PATH plan. " \"SPEED includes KEEP, ACCELERATE and DECELERATE, and STOP, " \"PATH includes STRAIGHT, RIGHT_CHANGE, LEFT_CHANGE, RIGHT_TURN, LEFT_TURN. " \"For example, a correct answer format is like 'KEEP, LEFT_CHANGE'."ego_fut_traj = np.cumsum(info['gt_ego_fut_trajs'], axis=-2)ego_pedal_status = get_obj_acc_or_dec(ego_fut_traj)ego_speed_plan = pedal_status[ego_pedal_status]ego_path_plan = get_obj_turn_or_lane_change(ego_fut_traj)ego_path_plan = path_status[ego_path_plan]answer = ego_speed_plan + ', ' + ego_path_plan + '\n'qa = format_qa(question, answer)return qa
get_plan_explaination_qa(规划解释问答
get_plan_explaination_qa 是用来生成对驾驶决策的解释型问答(QA)对的,结合了当前速度、导航命令、未来轨迹等信息,并调用模型对当前环境图像进行理解,生成对决策原因的解释
描述当前速度、导航命令、驾驶决策。
要求模型基于当前环境图片解释驾驶决策原因。
def get_plan_explaination_qa(info,tokenizer,model,image_processor):cf_img_name = info['images'][0]ego_cur_vel = info['gt_ego_lcf_feat'][7]ego_navi_cmd = info['ego_navi_cmd']ego_fut_traj = np.cumsum(info['gt_ego_fut_trajs'], axis=-2)ego_pedal_status = get_obj_acc_or_dec(ego_fut_traj)ego_speed_plan = pedal_status[ego_pedal_status]ego_path_plan = get_obj_turn_or_lane_change(ego_fut_traj)ego_path_plan = path_status[ego_path_plan]pedal_decision = {'KEEP': 'maintain the current speed','ACCELERATE': 'accelerate','DECELERATE': 'decelerate','STOP': 'stop the car'}path_decision = {'RIGHT_TURN': 'turn right','RIGHT_CHANGE': 'change to the right lane','LEFT_TURN': 'turn left','LEFT_CHANGE': 'change to the left lane','STRAIGHT': 'go straight'}if ego_speed_plan == 'STOP':decision = pedal_decision[ego_speed_plan]else:decision = pedal_decision[ego_speed_plan] + ' and ' + path_decision[ego_path_plan]question = "You are driving, " \f"your current speed is {int(ego_cur_vel)} m/s, " \f"and the navigation command is '{ego_navi_cmd}', " \"your driving decision for the next three seconds is to " \f"{decision}. " \"Based on the provided image of the driving environment, " \"explain the most likely reason for this decision in one or two concise sentence." \args = type('Args', (), {"query": question,"img_file": cf_img_name,})()answer = eval_llava_34b_wo_init(args, tokenizer, model, image_processor)qa = format_qa(question, answer)return qa
eval_llava_34b_wo_init
主要用于图像和文本生成任务,例如视觉问答(VQA)或图像描述生成
enna_nusc_data_converter.py的50行 eval_llava_34b_wo_init
eval_multi_img_model_wo_init
位于data_tools/seena_qa_utils.py中
answer = eval_multi_img_model_wo_init(args, tokenizer, model, image_processor)
处理多个图像输入和一个查询,并生成基于图像的文本回答。它首先处理输入的图像和查询,构建对话输入,然后将其传递给生成模型,最终解码模型输出并返回生成的文本。这个函数主要用于图像和文本生成任务,如视觉问答(VQA)或图像描述生成
构建问题:
images = load_images(args.image_file) # discard fisheye imageimage_sizes = [x.size for x in images]images_tensor = process_images(images,image_processor,model.config).to(model.device, dtype=torch.float16)images_tensor = images_tensor.unsqueeze(0)input_ids = (tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).cuda())
生成答案
output_ids = model.generate(input_ids,images=images_tensor,image_sizes=image_sizes,do_sample=True if args.temperature > 0 else False,temperature=args.temperature,top_p=args.top_p,num_beams=args.num_beams,max_new_tokens=args.max_new_tokens,use_cache=True,)
训练和测试指令
在SSR_Head中的class MLN中: train的时候72,test的时候12
训练
/home/gaojing/anaconda3/envs/ssr/bin/python -m torch.distributed.run --nproc_per_node=3 --master_port=2333 tools/train.py projects/configs/SSR/SSR_e2e.py --launcher pytorch --deterministic --work-dir outputs --resume-from outputs/latest.pth
测试
CUDA_VISIBLE_DEVICES=0 python tools/test.py projects/configs/SSR/SSR_e2e.py /data/Jozky/SSR/outputs_lora/latest.pth --launcher none --eval bbox --tmpdir tmp
数据集
nuscenes_infos_train.pkl 介绍