当前位置：首页 > news >正文

【火山引擎大模型批量推理数据教程---详细讲解一篇过！】

news 来源：原创 2025/6/3 18:18:16

0. 相关的文档

！！先注册火山引擎账号第一步！！

批量推理文档网页
对象存储网页
提交批量处理网页
费用接口网页

1. 准备jsonl数据集

官网地址
样例，需要根据你自己的数据进行需改

import json## 你的数据，自行处理成
data = []## 必须是list[dict]结构，
huoshan_data_jsonl = [{"custom_id": f"uuid-xxxx", # 必须唯一"body": {"messages": [{"role": "system", "content": "你的系统提示词"},{"role": "user","content": "你的文本内容",},],"temperature": 0.0,# 其他参数},}for d in data
]
huoshan_data_jsonl = sum(huoshan_data_jsonl, [])with open("hs_data.jsonl", "w", encoding="utf-8") as f:for d in huoshan_data_jsonl:f.write(json.dumps(d, ensure_ascii=False) + "\n")len(huoshan_data_jsonl), huoshan_data_jsonl[0]

检查数据是否符合规定

import jsondef check_jsonl_file(file_path):with open(file_path, "r", encoding="utf-8") as file:total = 0custom_id_set = set()for line in file:if line.strip() == "":continuetry:line_dict = json.loads(line)except json.decoder.JSONDecodeError:raise Exception(f"批量推理输入文件格式错误，第{total + 1}行非json数据")if not line_dict.get("custom_id"):raise Exception(f"批量推理输入文件格式错误，第{total + 1}行custom_id不存在")if not isinstance(line_dict.get("custom_id"), str):raise Exception(f"批量推理输入文件格式错误, 第{total + 1}行custom_id不是string")if line_dict.get("custom_id") in custom_id_set:raise Exception(f"批量推理输入文件格式错误，custom_id={line_dict.get('custom_id', '')}存在重复")else:custom_id_set.add(line_dict.get("custom_id"))if not isinstance(line_dict.get("body", ""), dict):raise Exception(f"批量推理输入文件格式错误，custom_id={line_dict.get('custom_id', '')}的body非json字符串")total += 1return totalfile_path = "hs_data.jsonl"
total_lines = check_jsonl_file(file_path)
print(f"文件中有效JSON数据的行数为: {total_lines}")