服务器部署一个千问2.5-14B、32B并发布为接口
模型:
Qwen/Qwen2.5-14B:魔搭社区
Qwen/Qwen2.5-32B:魔搭社区
其实在下载前我想试试魔塔的接口调用稳不稳定
from openai import OpenAIclient = OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='<MODELSCOPE_SDK_TOKEN>', # ModelScope Token
)# set extra_body for thinking control
extra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096
}response = client.chat.completions.create(model='Qwen/Qwen3-32B', # ModelScope Model-Idmessages=[{'role': 'user','content': '9.9和9.11谁大'}],stream=True,extra_body=extra_body
)
done_thinking = False
for chunk in response:thinking_chunk = chunk.choices[0].delta.reasoning_contentanswer_chunk = chunk.choices[0].delta.contentif thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif answer_chunk != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Trueprint(answer_chunk, end='', flush=True)
from openai import OpenAIclient = OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='<MODELSCOPE_SDK_TOKEN>', # ModelScope Token
)# set extra_body for thinking control
extra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096
}response = client.chat.completions.create(model='Qwen/Qwen3-14B', # ModelScope Model-Idmessages=[{'role': 'user','content': '9.9和9.11谁大'}],stream=True,extra_body=extra_body
)
done_thinking = False
for chunk in response:thinking_chunk = chunk.choices[0].delta.reasoning_contentanswer_chunk = chunk.choices[0].delta.contentif thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif answer_chunk != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Trueprint(answer_chunk, end='', flush=True)
我预计跑的数据为3k条左右,接口可用,就是有个问题,意思是你所使用的 Qwen/Qwen3-14B
模型仅支持流式模式(stream mode),你必须启用 stream
参数才能访问该模型。
🔁Qwen14B反馈如下: {'error': "Error: Error code: 400 - {'error': {'code': 'invalid_parameter_error', 'message': 'This model only support stream mode, please enable the stream parameter to access the model. ', 'param': None, 'type': 'invalid_request_error'}, 'request_id': 'e44ce9cc-2731-4956-86ff-0bce173ae68d'}"}
后面改了一下代码,流式输出已经能继续处理了。接下来就是看3k条数据跑完后是否有报错与遗漏,来看看接口的稳定性
def validate_explanation(term1, term2, prompt, score, explanation):print("🔎 正在调用Qwen14B 验证解释...")try:# set extra_body for thinking controlextra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096}glm4_response = zhipu_client.chat.completions.create(model="Qwen/Qwen3-14B",messages=[{"role": "system","content": ("你的角色是 DeepSeek 使用”提示词“中的附加材料情况的监督者,负责审查用户输入内容,并严格按照以下格式输出分析结果:"},{"role": "user","content": f"术语1: {term1}, 术语2: {term2}, 提示词: {prompt}, 分数: {score}, 解释: {explanation}"}],stream=True,extra_body=extra_body)full_output = ""done_thinking = Falsefor chunk in glm4_response:thinking_chunk = chunk.choices[0].delta.reasoning_contentoutput = chunk.choices[0].delta.contentprint(f"🧠Qwen14B返回:{output}")if thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif output != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Truefull_output += outputif full_output.startswith("```json"):full_output = full_output[7:-3].strip()match = re.search(r'{\s*"a":\s*(true|false),\s*"b":\s*(true|false),\s*"c":\s*(true|false),\s*"flag":\s*(true|false),\s*"reply":\s*"(.*?)"\s*}',full_output, re.DOTALL)if match:a, b, c, flag, reply = match.groups()return {"a": a == "true","b": b == "true","c": c == "true","flag": flag == "true","reply": reply.strip()}else:return {"error": "InvalidQwen14B response format: Unable to extract fields."}except Exception as e:return {"error": f"Error: {e}"}