当前位置: 首页 > news >正文

服务器部署一个千问2.5-14B、32B并发布为接口

模型:

Qwen/Qwen2.5-14B:魔搭社区

Qwen/Qwen2.5-32B:魔搭社区

其实在下载前我想试试魔塔的接口调用稳不稳定

from openai import OpenAIclient = OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='<MODELSCOPE_SDK_TOKEN>', # ModelScope Token
)# set extra_body for thinking control
extra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096
}response = client.chat.completions.create(model='Qwen/Qwen3-32B',  # ModelScope Model-Idmessages=[{'role': 'user','content': '9.9和9.11谁大'}],stream=True,extra_body=extra_body
)
done_thinking = False
for chunk in response:thinking_chunk = chunk.choices[0].delta.reasoning_contentanswer_chunk = chunk.choices[0].delta.contentif thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif answer_chunk != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Trueprint(answer_chunk, end='', flush=True)
from openai import OpenAIclient = OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='<MODELSCOPE_SDK_TOKEN>', # ModelScope Token
)# set extra_body for thinking control
extra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096
}response = client.chat.completions.create(model='Qwen/Qwen3-14B',  # ModelScope Model-Idmessages=[{'role': 'user','content': '9.9和9.11谁大'}],stream=True,extra_body=extra_body
)
done_thinking = False
for chunk in response:thinking_chunk = chunk.choices[0].delta.reasoning_contentanswer_chunk = chunk.choices[0].delta.contentif thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif answer_chunk != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Trueprint(answer_chunk, end='', flush=True)

我预计跑的数据为3k条左右,接口可用,就是有个问题,意思是你所使用的 Qwen/Qwen3-14B 模型仅支持流式模式(stream mode),你必须启用 stream 参数才能访问该模型。

🔁Qwen14B反馈如下: {'error': "Error: Error code: 400 - {'error': {'code': 'invalid_parameter_error', 'message': 'This model only support stream mode, please enable the stream parameter to access the model. ', 'param': None, 'type': 'invalid_request_error'}, 'request_id': 'e44ce9cc-2731-4956-86ff-0bce173ae68d'}"}

后面改了一下代码,流式输出已经能继续处理了。接下来就是看3k条数据跑完后是否有报错与遗漏,来看看接口的稳定性

def validate_explanation(term1, term2, prompt, score, explanation):print("🔎 正在调用Qwen14B 验证解释...")try:# set extra_body for thinking controlextra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096}glm4_response = zhipu_client.chat.completions.create(model="Qwen/Qwen3-14B",messages=[{"role": "system","content": ("你的角色是 DeepSeek 使用”提示词“中的附加材料情况的监督者,负责审查用户输入内容,并严格按照以下格式输出分析结果:"},{"role": "user","content": f"术语1: {term1}, 术语2: {term2}, 提示词: {prompt}, 分数: {score}, 解释: {explanation}"}],stream=True,extra_body=extra_body)full_output = ""done_thinking = Falsefor chunk in glm4_response:thinking_chunk = chunk.choices[0].delta.reasoning_contentoutput = chunk.choices[0].delta.contentprint(f"🧠Qwen14B返回:{output}")if thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif output != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Truefull_output += outputif full_output.startswith("```json"):full_output = full_output[7:-3].strip()match = re.search(r'{\s*"a":\s*(true|false),\s*"b":\s*(true|false),\s*"c":\s*(true|false),\s*"flag":\s*(true|false),\s*"reply":\s*"(.*?)"\s*}',full_output, re.DOTALL)if match:a, b, c, flag, reply = match.groups()return {"a": a == "true","b": b == "true","c": c == "true","flag": flag == "true","reply": reply.strip()}else:return {"error": "InvalidQwen14B response format: Unable to extract fields."}except Exception as e:return {"error": f"Error: {e}"}

相关文章:

  • 强化学习机器人模拟器——GridWorld:一个用于强化学习的 Python 环境
  • SurfSense开源程序是NotebookLM / Perplexity / Glean的开源替代品,连接到外部来源,如搜索引擎
  • 【Hive入门】Hive与Spark SQL深度集成:Metastore与Catalog兼容性全景解析
  • 互联网大厂Java求职面试:核心技术点深度解析
  • 文件一键解密软件工具(支持pdf、word、excel、ppt、rar、zip格式文件)
  • 【Qt】常用的类与数据类型
  • 护理岗位技能比赛主持稿串词
  • 【Hive入门】Hive与Spark SQL集成:混合计算实践指南
  • C++负载均衡远程调用学习之实时监测与自动发布功能
  • Jenkis安装、配置及账号权限分配保姆级教程
  • React实现B站评论Demo
  • Linux环境部署iview-admin项目
  • 智能工厂自主优化:从局部调优到全局演进
  • 【中间件】brpc_基础_用户态线程中断
  • 小程序 IView WeappUI组件库(简单增删改查)
  • iview 表单验证问题 Select 已经选择 还是弹验证提示
  • Qt实现 hello world + 内存泄漏(5)
  • Qt基础知识记录(终篇)
  • cloudfare+gmail 配置 smtp 邮箱
  • GPU集群训练经验评估框架:运营经理经验分析篇
  • jsp网站开发环境搭建/做个公司网站大概多少钱
  • wordpress 评论登陆/山东服务好的seo
  • 网站建设具体流程/seo推广骗局
  • 建一个com网站要多少钱/武汉seo关键字推广
  • 花店网站模板/推广引流app
  • 深圳建站公司告诉你十个建站步骤/搜索引擎优化营销