当前位置: 首页 > news >正文

SmolVLM2: The Smollest Video Model Ever(七)

编写测试代码与评价指标

现在的数据集里面只涉及tool的分类和手术phase的分类,所以编写的评价指标还是那些通用的,但是:

predicted_labels:['The current surgical phase is CalotTriangleDissection, Grasper, Hook tool exists.', 'The current surgical phase is GallbladderDissection, Grasper, Hook tool exists.', 'The current surgical phase is GallbladderDissection, Hook tool exists.', 'The current surgical phase is Endostomy, Grasper tool exists.']


true_labels:['The current surgical phase is CalotTriangleDissection, Grasper, Hook tool exists.', 'The current surgical phase is GallbladderDissection, Grasper, Hook tool exists.', 'The current surgical phase is GallbladderDissection, Hook tool exists.', 'The current surgical phase is CleaningCoagulation, Bipolar, Grasper tool exists.', 'The current surgical phase is Preparation, no tool exists.', 'The current surgical phase is CalotTriangleDissection, Grasper, Hook tool exists.']

模型加载问题

Traceback (most recent call last):File "/mnt/share/toky/Projects/SmolVLM2_Test/Smol_VLM_Video_Test.py", line 14, in <module>processor = AutoProcessor.from_pretrained(model_path)File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 375, in from_pretrainedraise ValueError(
ValueError: Unrecognized processing class in /mnt/share/toky/LLMs/Toky-Generate/SmolVLM2-2.2B-Instruct-video-feedback-Cholec80/. Can't instantiate a processor, a tokenizer, an image processor or a feature extractor for this model. Make sure the repository contains the files of at least one of those processing classes.

保存到本地的时候记得两个都保存:

# 保存模型到本地指定目录
trainer.save_model(save_directory)
processor.save_pretrained(save_directory)

造数据第二弹

接入Kimi:

/mnt/share/toky/CondaEnvs/GenSurgery/bin/python /mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py 
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T17:02:50.000000Z', 'title': '7301HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2902, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T17:02:50.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 1522.04, 'bitrate': 2905, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2902, 'video_fps': 25.0, 'video_duration': 1522.04, 'video_n_frames': 38051}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video04.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 1679.520000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 1679.645000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
<phase>GallbladderPackaging</phase>. Tools: <tool>Grasper, SpecimenBag</tool>. Next step: Carefully grasp the gallbladder and insert it into the specimen bag. Ensure the bag is fully opened and no tissue is torn. Risks: Bleeding from the liver bed or cystic duct stump. Maintain hemostasis and monitor for any signs of bleeding.
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 994.520000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 994.645000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
<phase>Clipping</phase>, <tool>Grasper</tool>. Next, secure clip on cystic artery. Risk: Bleeding. Ensure clip is properly placed to avoid slippage.
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 683.520000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 683.645000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
Next step: Secure cystic artery and duct. Use grasper to expose, hook for dissection. Be cautious of bleeding from Calot triangle vessels. Risk: Hemorrhage. Ensure clear visualization and gentle tissue handling to prevent bile duct injury.
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 1801.520000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
{'video_found': True, 'audio_found': False, 'metadata': {'major_brand': 'mp42', 'minor_version': '0', 'compatible_brands': 'mp42isomavc1', 'creation_time': '2017-01-13T18:33:13.000000Z', 'title': '8307HD Bin.01-H264-480', 'encoder': 'HandBrake rev0 2013051899'}, 'inputs': [{'streams': [{'input_number': 0, 'stream_number': 0, 'stream_type': 'video', 'language': None, 'default': True, 'size': [854, 480], 'bitrate': 2606, 'fps': 25.0, 'codec_name': 'h264', 'profile': '(Main)', 'metadata': {'Metadata': '', 'creation_time': '2017-01-13T18:33:13.000000Z', 'vendor_id': '[0][0][0][0]', 'encoder': 'JVT/AVC Coding'}}], 'input_number': 0}], 'duration': 2129.04, 'bitrate': 2609, 'start': 0.0, 'default_video_input_number': 0, 'default_video_stream_number': 0, 'video_codec_name': 'h264', 'video_profile': '(Main)', 'video_size': [854, 480], 'video_bitrate': 2606, 'video_fps': 25.0, 'video_duration': 2129.04, 'video_n_frames': 53226}
/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/imageio_ffmpeg/binaries/ffmpeg-linux-x86_64-v7.0.2 -ss 1801.645000 -i /mnt/share/toky/Datasets/Cholec80/videos/video25.mp4 -ss 1.000000 -loglevel error -f image2pipe -vf scale=854:480 -sws_flags bicubic -pix_fmt rgb24 -vcodec rawvideo -
Traceback (most recent call last):File "/mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py", line 315, in <module>process_samples()File "/mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py", line 114, in process_samplesprocess_mode('train', train_samples)File "/mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py", line 144, in process_modedata = build_metadata(File "/mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py", line 274, in build_metadatatext_prompt = call_LLM_generate_text_prompt(phase_and_tool_info,frame_save_dir)File "/mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py", line 269, in call_LLM_generate_text_promptresult = kimi_AI(phase_and_tool_info,frame_save_dir)File "/mnt/share/toky/Projects/GenSurgery/toky_process_dataset/process_cholec80.py", line 259, in kimi_AIresponse = client.chat.completions.create(File "/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/openai/_utils/_utils.py", line 287, in wrapperreturn func(*args, **kwargs)File "/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py", line 925, in createreturn self._post(File "/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/openai/_base_client.py", line 1239, in postreturn cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))File "/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/openai/_base_client.py", line 1034, in requestraise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Your account org-375e9bdee55043cabb13ba9fd1e99161<ak-f194u38t148111bw43ti> request reached organization max RPM: 3, please try again after 1 seconds', 'type': 'rate_limit_reached_error'}}

    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/mnt/share/toky/CondaEnvs/GenSurgery/lib/python3.10/site-packages/openai/_base_client.py", line 1034, in request
    raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Your account org-375e9bdee55043cabb13ba9fd1e99161<ak-f194u38t148111bw43ti> request reached organization max RPM: 3, please try again after 1 seconds', 'type': 'rate_limit_reached_error'}}

说明:

这个错误 openai.RateLimitError 及错误码 429 表示你调用 OpenAI API 时达到了组织(organization)的请求速率限制(RPM,Requests Per Minute,每分钟请求数)。

错误信息中明确指出你的账号所在组织 org-375e9bdee55043cabb13ba9fd1e99161 的请求速率达到了上限(最大 RPM 为 3),需要等待 1 秒钟后再尝试。

解决:在调用的时候加一个计时器去限制一下。

豆包的输出情况

[{"id": "t25_0000000","images": ["t25_0000000_06.jpg","t25_0000000_01.jpg","t25_0000000_21.jpg","t25_0000000_18.jpg","t25_0000000_14.jpg","t25_0000000_05.jpg","t25_0000000_00.jpg","t25_0000000_07.jpg","t25_0000000_15.jpg","t25_0000000_03.jpg","t25_0000000_12.jpg","t25_0000000_23.jpg","t25_0000000_19.jpg","t25_0000000_08.jpg","t25_0000000_16.jpg","t25_0000000_17.jpg","t25_0000000_13.jpg","t25_0000000_10.jpg","t25_0000000_20.jpg","t25_0000000_11.jpg","t25_0000000_02.jpg","t25_0000000_04.jpg","t25_0000000_22.jpg","t25_0000000_09.jpg"],"text prompt": "<phase>CalotTriangleDissection</phase>\n<tool>Hook, Grasper</tool>\nNext step: Continue dissecting Calot's Triangle to expose cystic duct and artery.\n<risk> Bleeding from nearby blood vessels. </risk>\n<risk level> 2 </risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/test/videos_annotated/t25_0000000.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 2,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n    First, explicitly repeat the current surgical phase <phase>CalotTriangleDissection</phase> and the used tools <tool>Hook, Grasper</tool>.\n    Next, concisely present the next-step surgical prompt and navigation information.\n    Additionally, highlight potential risks such as bleeding.    Enclose the description of potential risks within <risk> </risk> tags.   Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n    Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n    "},{"from": "gpt","value": "<phase>CalotTriangleDissection</phase>\n<tool>Hook, Grasper</tool>\nNext step: Continue dissecting Calot's Triangle to expose cystic duct and artery.\n<risk> Bleeding from nearby blood vessels. </risk>\n<risk level> 2 </risk level>"}]},{"id": "t04_0000001","images": ["t04_0000001_16.jpg","t04_0000001_11.jpg","t04_0000001_00.jpg","t04_0000001_10.jpg","t04_0000001_04.jpg","t04_0000001_05.jpg","t04_0000001_22.jpg","t04_0000001_01.jpg","t04_0000001_15.jpg","t04_0000001_08.jpg","t04_0000001_20.jpg","t04_0000001_06.jpg","t04_0000001_09.jpg","t04_0000001_13.jpg","t04_0000001_14.jpg","t04_0000001_03.jpg","t04_0000001_19.jpg","t04_0000001_23.jpg","t04_0000001_18.jpg","t04_0000001_21.jpg","t04_0000001_07.jpg","t04_0000001_17.jpg","t04_0000001_02.jpg","t04_0000001_12.jpg"],"text prompt": "<phase>CalotTriangleDissection</phase>\n<tool>Hook, Grasper</tool>\nNext step: Continue dissecting Calot's triangle to identify cystic duct and artery.\n<risk> Bleeding from vascular structures in Calot's triangle. </risk>\n<risk level> 2 </risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/test/videos_annotated/t04_0000001.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 2,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n    First, explicitly repeat the current surgical phase <phase>CalotTriangleDissection</phase> and the used tools <tool>Hook, Grasper</tool>.\n    Next, concisely present the next-step surgical prompt and navigation information.\n    Additionally, highlight potential risks such as bleeding.    Enclose the description of potential risks within <risk> </risk> tags.   Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n    Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n    "},{"from": "gpt","value": "<phase>CalotTriangleDissection</phase>\n<tool>Hook, Grasper</tool>\nNext step: Continue dissecting Calot's triangle to identify cystic duct and artery.\n<risk> Bleeding from vascular structures in Calot's triangle. </risk>\n<risk level> 2 </risk level>"}]},{"id": "t25_0000002","images": ["t25_0000002_08.jpg","t25_0000002_15.jpg","t25_0000002_02.jpg","t25_0000002_16.jpg","t25_0000002_21.jpg","t25_0000002_12.jpg","t25_0000002_14.jpg","t25_0000002_19.jpg","t25_0000002_22.jpg","t25_0000002_04.jpg","t25_0000002_20.jpg","t25_0000002_06.jpg","t25_0000002_01.jpg","t25_0000002_10.jpg","t25_0000002_17.jpg","t25_0000002_18.jpg","t25_0000002_00.jpg","t25_0000002_13.jpg","t25_0000002_03.jpg","t25_0000002_07.jpg","t25_0000002_23.jpg","t25_0000002_05.jpg","t25_0000002_09.jpg","t25_0000002_11.jpg"],"text prompt": "<phase>GallbladderDissection</phase>\n<tool>Grasper</tool>\nNext step: Continue dissecting the gallbladder. Navigate carefully around adjacent structures.\n<risk> Bleeding from nearby blood vessels. </risk>\n<risk level> 2 </risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/test/videos_annotated/t25_0000002.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 2,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n    First, explicitly repeat the current surgical phase <phase>GallbladderDissection</phase> and the used tools <tool>Grasper</tool>.\n    Next, concisely present the next-step surgical prompt and navigation information.\n    Additionally, highlight potential risks such as bleeding.    Enclose the description of potential risks within <risk> </risk> tags.   Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n    Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n    "},{"from": "gpt","value": "<phase>GallbladderDissection</phase>\n<tool>Grasper</tool>\nNext step: Continue dissecting the gallbladder. Navigate carefully around adjacent structures.\n<risk> Bleeding from nearby blood vessels. </risk>\n<risk level> 2 </risk level>"}]}
]

Kimi生成的数据样例

[{"id": "t25_0000000","images": ["t25_0000000_06.jpg","t25_0000000_01.jpg","t25_0000000_21.jpg","t25_0000000_18.jpg","t25_0000000_14.jpg","t25_0000000_05.jpg","t25_0000000_00.jpg","t25_0000000_07.jpg","t25_0000000_15.jpg","t25_0000000_03.jpg","t25_0000000_12.jpg","t25_0000000_23.jpg","t25_0000000_19.jpg","t25_0000000_08.jpg","t25_0000000_16.jpg","t25_0000000_17.jpg","t25_0000000_13.jpg","t25_0000000_10.jpg","t25_0000000_20.jpg","t25_0000000_11.jpg","t25_0000000_02.jpg","t25_0000000_04.jpg","t25_0000000_22.jpg","t25_0000000_09.jpg"],"text prompt": "<phase>CalotTriangleDissection</phase> is currently underway using a <tool>Hook</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t25_0000000.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 1,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>CalotTriangleDissection</phase> and the used tools <tool>Hook</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>CalotTriangleDissection</phase> is currently underway using a <tool>Hook</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>"}]},{"id": "t25_0000001","images": ["t25_0000001_13.jpg","t25_0000001_15.jpg","t25_0000001_12.jpg","t25_0000001_06.jpg","t25_0000001_21.jpg","t25_0000001_10.jpg","t25_0000001_20.jpg","t25_0000001_02.jpg","t25_0000001_19.jpg","t25_0000001_04.jpg","t25_0000001_07.jpg","t25_0000001_09.jpg","t25_0000001_16.jpg","t25_0000001_17.jpg","t25_0000001_23.jpg","t25_0000001_18.jpg","t25_0000001_14.jpg","t25_0000001_11.jpg","t25_0000001_00.jpg","t25_0000001_03.jpg","t25_0000001_22.jpg","t25_0000001_01.jpg","t25_0000001_08.jpg","t25_0000001_05.jpg"],"text prompt": "<phase>GallbladderDissection</phase> is currently underway using a <tool>Hook</tool>. Next, carefully retract the gallbladder to expose the cystic duct and artery for ligation. <risk>Bleeding</risk> is a potential risk, especially during dissection near the liver bed. <risk level>2</risk level> indicates moderate risk due to the proximity to major blood vessels.","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t25_0000001.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 2,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>GallbladderDissection</phase> and the used tools <tool>Hook</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>GallbladderDissection</phase> is currently underway using a <tool>Hook</tool>. Next, carefully retract the gallbladder to expose the cystic duct and artery for ligation. <risk>Bleeding</risk> is a potential risk, especially during dissection near the liver bed. <risk level>2</risk level> indicates moderate risk due to the proximity to major blood vessels."}]},{"id": "t04_0000002","images": ["t04_0000002_02.jpg","t04_0000002_15.jpg","t04_0000002_12.jpg","t04_0000002_21.jpg","t04_0000002_10.jpg","t04_0000002_08.jpg","t04_0000002_14.jpg","t04_0000002_03.jpg","t04_0000002_09.jpg","t04_0000002_18.jpg","t04_0000002_13.jpg","t04_0000002_19.jpg","t04_0000002_01.jpg","t04_0000002_05.jpg","t04_0000002_06.jpg","t04_0000002_07.jpg","t04_0000002_04.jpg","t04_0000002_22.jpg","t04_0000002_00.jpg","t04_0000002_17.jpg","t04_0000002_16.jpg","t04_0000002_23.jpg","t04_0000002_11.jpg","t04_0000002_20.jpg"],"text prompt": "<phase>Calot Triangle Dissection</phase> is currently underway using a <tool>Grasper</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t04_0000002.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 1,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>CalotTriangleDissection</phase> and the used tools <tool>Grasper</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>Calot Triangle Dissection</phase> is currently underway using a <tool>Grasper</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>"}]},{"id": "t25_0000003","images": ["t25_0000003_07.jpg","t25_0000003_14.jpg","t25_0000003_20.jpg","t25_0000003_18.jpg","t25_0000003_03.jpg","t25_0000003_05.jpg","t25_0000003_12.jpg","t25_0000003_10.jpg","t25_0000003_06.jpg","t25_0000003_17.jpg","t25_0000003_01.jpg","t25_0000003_23.jpg","t25_0000003_15.jpg","t25_0000003_19.jpg","t25_0000003_09.jpg","t25_0000003_11.jpg","t25_0000003_13.jpg","t25_0000003_08.jpg","t25_0000003_16.jpg","t25_0000003_04.jpg","t25_0000003_21.jpg","t25_0000003_00.jpg","t25_0000003_02.jpg","t25_0000003_22.jpg"],"text prompt": "<phase>GallbladderPackaging</phase>. <tool>Grasper, SpecimenBag</tool>. Next, ensure the gallbladder is fully contained within the bag. Carefully exteriorize the specimen bag through the incision. <risk>Risks include bleeding from the liver bed or cystic duct stump. Monitor hemostasis closely.</risk> <risk level>1</risk level>.","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t25_0000003.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 1,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>GallbladderPackaging</phase> and the used tools <tool>Grasper, SpecimenBag</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>GallbladderPackaging</phase>. <tool>Grasper, SpecimenBag</tool>. Next, ensure the gallbladder is fully contained within the bag. Carefully exteriorize the specimen bag through the incision. <risk>Risks include bleeding from the liver bed or cystic duct stump. Monitor hemostasis closely.</risk> <risk level>1</risk level>."}]},{"id": "t25_0000004","images": ["t25_0000004_01.jpg","t25_0000004_16.jpg","t25_0000004_09.jpg","t25_0000004_11.jpg","t25_0000004_20.jpg","t25_0000004_14.jpg","t25_0000004_22.jpg","t25_0000004_07.jpg","t25_0000004_02.jpg","t25_0000004_08.jpg","t25_0000004_05.jpg","t25_0000004_21.jpg","t25_0000004_12.jpg","t25_0000004_17.jpg","t25_0000004_06.jpg","t25_0000004_03.jpg","t25_0000004_10.jpg","t25_0000004_04.jpg","t25_0000004_18.jpg","t25_0000004_15.jpg","t25_0000004_19.jpg","t25_0000004_00.jpg","t25_0000004_13.jpg","t25_0000004_23.jpg"],"text prompt": "<phase>Calot Triangle Dissection</phase> is currently underway using <tool>Grasper and Hook</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t25_0000004.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 1,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>CalotTriangleDissection</phase> and the used tools <tool>Grasper, Hook</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>Calot Triangle Dissection</phase> is currently underway using <tool>Grasper and Hook</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>"}]},{"id": "t25_0000005","images": ["t25_0000005_02.jpg","t25_0000005_13.jpg","t25_0000005_14.jpg","t25_0000005_01.jpg","t25_0000005_04.jpg","t25_0000005_20.jpg","t25_0000005_12.jpg","t25_0000005_23.jpg","t25_0000005_18.jpg","t25_0000005_03.jpg","t25_0000005_16.jpg","t25_0000005_07.jpg","t25_0000005_21.jpg","t25_0000005_19.jpg","t25_0000005_17.jpg","t25_0000005_00.jpg","t25_0000005_05.jpg","t25_0000005_15.jpg","t25_0000005_09.jpg","t25_0000005_22.jpg","t25_0000005_06.jpg","t25_0000005_08.jpg","t25_0000005_11.jpg","t25_0000005_10.jpg"],"text prompt": "<phase>Calot Triangle Dissection</phase> is currently underway using a <tool>Hook</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t25_0000005.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 1,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>CalotTriangleDissection</phase> and the used tools <tool>Hook</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>Calot Triangle Dissection</phase> is currently underway using a <tool>Hook</tool>. Next, carefully identify and clip the cystic artery and cystic duct. <risk>Risks include bleeding from the cystic artery or duct.</risk> <risk level>1</risk level>"}]},{"id": "t25_0000006","images": ["t25_0000006_17.jpg","t25_0000006_16.jpg","t25_0000006_05.jpg","t25_0000006_08.jpg","t25_0000006_07.jpg","t25_0000006_11.jpg","t25_0000006_02.jpg","t25_0000006_06.jpg","t25_0000006_20.jpg","t25_0000006_04.jpg","t25_0000006_18.jpg","t25_0000006_15.jpg","t25_0000006_13.jpg","t25_0000006_03.jpg","t25_0000006_21.jpg","t25_0000006_19.jpg","t25_0000006_09.jpg","t25_0000006_22.jpg","t25_0000006_14.jpg","t25_0000006_10.jpg","t25_0000006_12.jpg","t25_0000006_01.jpg","t25_0000006_00.jpg","t25_0000006_23.jpg"],"text prompt": "<phase>GallbladderDissection</phase>. <tool>Grasper, Hook</tool> are being used. Next, carefully retract the gallbladder and continue dissection to free it from liver bed. <risk>Bleeding from liver bed or cystic artery</risk> <risk level>1</risk level>.","video link": "/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/train/videos_annotated/t25_0000006.mp4","visual quality": 4,"temporal consistency": 4,"dynamic degree": 4,"text-to-video alignment": 4,"factual consistency": 4,"risk level": 1,"conversations": [{"from": "human","value": "You're an experienced gallbladder surgery expert. Based on the provided surgical video frames:\n        First, explicitly repeat the current surgical phase <phase>GallbladderDissection</phase> and the used tools <tool>Grasper, Hook</tool>.\n        Next, concisely present the next-step surgical prompt and navigation information.\n        Additionally, highlight potential risks such as bleeding. Enclose the description of potential risks within <risk> </risk> tags. Moreover, infer the current risk level from the images.    Represent the risk level as <risk level> [0 - 4] </risk level>, where 0 indicates no risk and 4 represents the highest risk level.\n        Your entire response must not exceed 100 words and should include <phase></phase>, <tool> </tool>, and <risk> </risk> <risk level> </risk level> tags.\n        "},{"from": "gpt","value": "<phase>GallbladderDissection</phase>. <tool>Grasper, Hook</tool> are being used. Next, carefully retract the gallbladder and continue dissection to free it from liver bed. <risk>Bleeding from liver bed or cystic artery</risk> <risk level>1</risk level>."}]}
]

总的来说,kimi输出的起码 risk level 还是不一样的。成本上来讲,kimi应该是更便宜一些,还免费送了15块钱。

重新测试加载数据集

发现重新加载时有奇怪的缓存,还沿用之前的数据集的text prompt,改代码重新加载数据集才行。

由于重新加了一个risk level 的数值,所以不匹配,报错。

Generating train split:   0%|          | 0/32901 [00:00<?, ? examples/s]Failed to read file '/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback/annotated/train-00000-of-00001.parquet' with error <class 'datasets.table.CastError'>: Couldn't cast
id: string
images: list<element: string>child 0, element: string
text prompt: string
video link: string
visual quality: int64
temporal consistency: int64
dynamic degree: int64
text-to-video alignment: int64
factual consistency: int64
risk level: int64
conversations: list<element: struct<from: string, value: string>>child 0, element: struct<from: string, value: string>child 0, from: stringchild 1, value: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 1637
to
{'id': Value(dtype='string', id=None), 'images': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), 'text prompt': Value(dtype='string', id=None), 'video link': Value(dtype='string', id=None), 'visual quality': Value(dtype='int64', id=None), 'temporal consistency': Value(dtype='int64', id=None), 'dynamic degree': Value(dtype='int64', id=None), 'text-to-video alignment': Value(dtype='int64', id=None), 'factual consistency': Value(dtype='int64', id=None), 'conversations': [{'from': Value(dtype='string', id=None), 'value': Value(dtype='string', id=None)}]}
because column names don't match
Generating train split:   0%|          | 0/32901 [00:00<?, ? examples/s]
Traceback (most recent call last):File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/builder.py", line 1855, in _prepare_split_singlefor _, table in generator:File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/packaged_modules/parquet/parquet.py", line 106, in _generate_tablesyield f"{file_idx}_{batch_idx}", self._cast_table(pa_table)File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/packaged_modules/parquet/parquet.py", line 73, in _cast_tablepa_table = table_cast(pa_table, self.info.features.arrow_schema)File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/table.py", line 2293, in table_castreturn cast_table_to_schema(table, schema)File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/table.py", line 2241, in cast_table_to_schemaraise CastError(
datasets.table.CastError: Couldn't cast
id: string
images: list<element: string>child 0, element: string
text prompt: string
video link: string
visual quality: int64
temporal consistency: int64
dynamic degree: int64
text-to-video alignment: int64
factual consistency: int64
risk level: int64
conversations: list<element: struct<from: string, value: string>>child 0, element: struct<from: string, value: string>child 0, from: stringchild 1, value: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 1637
to
{'id': Value(dtype='string', id=None), 'images': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), 'text prompt': Value(dtype='string', id=None), 'video link': Value(dtype='string', id=None), 'visual quality': Value(dtype='int64', id=None), 'temporal consistency': Value(dtype='int64', id=None), 'dynamic degree': Value(dtype='int64', id=None), 'text-to-video alignment': Value(dtype='int64', id=None), 'factual consistency': Value(dtype='int64', id=None), 'conversations': [{'from': Value(dtype='string', id=None), 'value': Value(dtype='string', id=None)}]}
because column names don't matchThe above exception was the direct cause of the following exception:Traceback (most recent call last):File "/mnt/share/toky/Projects/SmolVLM2_Test/Smol_VLM_Video_Test.py", line 17, in <module>ds = load_dataset("/mnt/share/toky/Datasets/Toky_Generate/Cholec80VideoFeedback", name="annotated")File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/load.py", line 2084, in load_datasetbuilder_instance.download_and_prepare(File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/builder.py", line 925, in download_and_prepareself._download_and_prepare(File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/builder.py", line 1001, in _download_and_prepareself._prepare_split(split_generator, **prepare_split_kwargs)File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/builder.py", line 1742, in _prepare_splitfor job_id, done, content in self._prepare_split_single(File "/mnt/share/toky/CondaEnvs/LM/lib/python3.10/site-packages/datasets/builder.py", line 1898, in _prepare_split_singleraise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

那就在readme里面加上:

还有就是老步骤

datasets.exceptions.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='test', num_bytes=1348268, num_examples=680, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='test', num_bytes=12342, num_examples=6, shard_lengths=None, dataset_name='cholec80_video_feedback')}, {'expected': SplitInfo(name='train', num_bytes=65281005, num_examples=32901, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='train', num_bytes=28919, num_examples=14, shard_lengths=None, dataset_name='cholec80_video_feedback')}]

相关文章:

  • C++创建对象过程
  • 什么是单片机?
  • k8s中kubeSphere的安装使用+阿里云私有镜像仓库配置完整步骤
  • 2023年6级第一套长篇阅读
  • STM32 单片机启动过程全解析:从上电到主函数的旅程
  • 给echarts地图添加纹理底图不显示问题
  • Apache Airflow
  • EfficMultiCoreMemoryPool项目
  • 09_降维、特征提取与流行学习
  • 深度解析UniApp盲盒系统开发:从源码架构到多端部署全流程
  • 五、web安全--XSS漏洞(2)--XSS相关payload
  • 深入理解C#泛型:提升代码复用与类型安全的利器
  • 联通专线加持!亿林网络 24 核 32G 裸金属服务器,千兆共享带宽适配中小型企业 IT 架构
  • ONLYOFFICE文档API:编辑器的品牌定制化
  • 在 Ubuntu 服务器上 下载 Clash 文件使用代理
  • 动态规划-152.乘积最大子数组-力扣(LeetCode)
  • AI模型升级与机器人产业落地同步推进
  • 09《从依赖管理到容器化部署:Maven 全链路实战笔记,解锁 Java 项目自动化构建的终极奥秘》
  • 51c视觉~3D~合集3
  • 深入理解 Maven 循环依赖问题及其解决方案
  • 海安网站建设/网站开发外包
  • 做网站的岗位叫什么问题/产品推广平台
  • 重庆网站建设价位/备案查询网
  • 电子商城网站开发 pdf/个人怎么做百度竞价
  • 女人做绿叶网站相亲拉人/seo舆情优化
  • wordpress的页面标题/广州seo优化外包服务