python 本地运行Qwen3-Embedding-0.6B 模型提供API接口
python 本地运行Qwen3-Embedding-0.6B 模型提供API接口
- 下载模型
- 配置环境变量 (非必须)
- 安装 modelscope 下载模型
- 安装依赖
- 加载模型,启动web服务
- web 服务代码
- 在虚拟环境中启动服务
- 验证
下载模型
配置环境变量 (非必须)
如果不配置环境变量,windows系统模型缓存到C盘的
配置环境变量
验证环境变量是否配置成功
import os
print(os.getenv("MODELSCOPE_CACHE"))
输出了对应的路径,说明配置成功了
安装 modelscope 下载模型
先安装modelscope,然后使用命令行下载
pip install modelscope
modelscope download --model Qwen/Qwen3-Embedding-0.6B
下载完模型查看对应的配置路径是否有模型
安装依赖
pip install sentence-transformers flask
安装后的版本信息
(qwen3_embedding) F:\Python\local_embedding\src\embedding_model>pip list
Package Version
--------------------- ---------
blinker 1.9.0
certifi 2025.4.26
charset-normalizer 3.4.2
click 8.2.1
colorama 0.4.6
filelock 3.18.0
Flask 3.1.1
fsspec 2025.5.1
huggingface-hub 0.32.4
idna 3.10
itsdangerous 2.2.0
Jinja2 3.1.6
joblib 1.5.1
MarkupSafe 3.0.2
mpmath 1.3.0
networkx 3.5
numpy 2.3.0
packaging 25.0
pillow 11.2.1
pip 25.1
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.4
safetensors 0.5.3
scikit-learn 1.7.0
scipy 1.15.3
sentence-transformers 4.1.0
setuptools 78.1.1
sympy 1.14.0
threadpoolctl 3.6.0
tokenizers 0.21.1
torch 2.7.1
tqdm 4.67.1
transformers 4.52.4
typing_extensions 4.14.0
urllib3 2.4.0
Werkzeug 3.1.3
wheel 0.45.1
加载模型,启动web服务
web 服务代码
from flask import Flask, request, jsonify
from sentence_transformers import SentenceTransformer
import logginglogging.basicConfig(level=logging.DEBUG)app = Flask(__name__)# Load the model
model = SentenceTransformer(model_name_or_path="D:\modelscope\models\Qwen\Qwen3-Embedding-0.6B")@app.route('/embed', methods=['POST'])
def get_embedding():text = request.json['text']document_embeddings = model.encode(text)arr_list = document_embeddings.tolist()return jsonify({"embedding": arr_list})if __name__ == '__main__':app.run(host='0.0.0.0', port=5000)
-
代码功能说明
该代码实现了一个基于 Flask 的 HTTP API 服务,用于将输入的文本转换为嵌入向量(embedding)。核心功能是通过 Qwen3-Embedding-0.6B 模型生成文本的向量表示。 -
代码结构解析
依赖导入
from flask import Flask, request, jsonify
from sentence_transformers import SentenceTransformer
import logging
Flask 应用初始化
app = Flask(__name__)
model = SentenceTransformer(model_name_or_path="D:\modelscope\models\Qwen\Qwen3-Embedding-0.6B")
API 路由定义
@app.route('/embed', methods=['POST'])
def get_embedding():text = request.json['text']document_embeddings = model.encode(text)arr_list = document_embeddings.tolist()return jsonify({"embedding": arr_list})
服务启动
if __name__ == '__main__':app.run(host='0.0.0.0', port=5000)
在虚拟环境中启动服务
(qwen3_embedding) F:\Python\local_embedding\src\embedding_model>python embedding_server.py
E:\Anaconda\envs\qwen3_embedding\Lib\site-packages\transformers\utils\hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.warnings.warn(
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: D:\modelscope\models\Qwen\Qwen3-Embedding-0.6B
INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'document']* Serving Flask app 'embedding_server'* Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.* Running on all addresses (0.0.0.0)* Running on http://127.0.0.1:5000* Running on http://192.168.8.104:5000
INFO:werkzeug:Press CTRL+C to quit