Qwen3 Embedding 测试
目录
- 环境准备
- 初始化
- 例子
- 另个例子
环境准备
uv init
uv venv
.venv/Script/activate
uv pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
uv pip install transformers
初始化
import torch
import torch.nn.functional as Ffrom torch import Tensor
from transformers import AutoTokenizer, AutoModel
这部分导入了PyTorch相关库和Hugging Face的Transformers库,为使用Qwen3-Embedding模型做准备。
def last_token_pool(last_hidden_states: Tensor,attention_mask: Tensor) -> Tensor:left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])if left_padding:print("left_padding")return last_hidden_states[:, -1]else:sequence_lengths = attention_mask.sum(dim=1) - 1batch_size = last_hidden_states.shape[0]return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]def get_detailed_instruct(task_description: str, query: str) -> str:return f'Instruct: {task_description}\nQuery:{query}'
这里定义了两个关键函数:
last_token_pool
:从模型输出的隐藏状态中提取最后一个有效token的表示。它处理两种填充情况:- 左侧填充:直接取最后一个位置的向量
- 右侧填充:计算每个序列的实际长度,然后取对应位置的向量
get_detailed_instruct
:创建指令格式的查询字符串,将任务描述和查询组合在一起
例子
task = 'Given a web search query, retrieve relevant passages that answer the query'queries = [get_detailed_instruct(task, 'What is the capital of China?'),get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents
input_texts
['Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:What is the capital of China?','Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:Explain gravity','The capital of China is Beijing.','Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.']
这部分准备了测试数据:
- 定义了一个检索任务描述
- 创建了两个带指令格式的查询
- 准备了两个文档作为检索目标
- 将查询和文档合并为一个输入列表
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')
这里加载了Qwen3-Embedding-0.6B模型和对应的分词器:
- 使用
padding_side='left'
设置左侧填充,这对于后续提取最后一个token的表示很重要 - 模型加载为CPU模式(如需GPU加速可添加
.cuda()
)
max_length = 8192# Tokenize the input texts
batch_dict = tokenizer(input_texts,padding=True,truncation=True,max_length=max_length,return_tensors="pt",
)
batch_dict.keys()
dict_keys(['input_ids', 'attention_mask'])
这部分对输入文本进行分词处理:
- 设置最大长度为8192(Qwen3-Embedding支持的上下文长度)
- 启用填充和截断,确保所有序列长度一致
- 返回PyTorch张量格式
- 输出显示batch_dict包含’input_ids’和’attention_mask’两个键
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
embeddings.shape
这部分进行模型推理和嵌入提取:
- 将输入数据移至模型所在设备
- 通过模型获取输出
- 使用
last_token_pool
函数提取每个序列的表示向量 - 输出显示"left_padding",表明使用了左侧填充
- 嵌入向量形状为[4, 1024],表示4个输入文本,每个嵌入维度为1024
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
scores
tensor([[0.7646, 0.1414],[0.1355, 0.6000]], grad_fn=<MmBackward0>)
最后一部分计算相似度分数:
- 对嵌入向量进行L2归一化,使其长度为1
- 计算查询嵌入(前2个)与文档嵌入(后2个)的点积,得到相似度矩阵
- 结果显示查询1与文档1的相似度为0.7646(高),与文档2的相似度为0.1414(低)
- 查询2与文档1的相似度为0.1355(低),与文档2的相似度为0.6000(高)
这表明模型成功地将查询与相关文档匹配起来:"中国首都"的查询与包含"北京"的文档相似度高,"解释重力"的查询与描述重力的文档相似度高。
另个例子
task = 'Given a web search query, retrieve relevant passages that answer the query'queries = [get_detailed_instruct(task, 'How does photosynthesis work?'),get_detailed_instruct(task, 'What are the benefits of exercise?')
]
# No need to add instruction for retrieval documents
documents = ["Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll. During this process, plants convert light energy into chemical energy, absorb carbon dioxide and release oxygen.","Regular exercise offers numerous benefits including improved cardiovascular health, stronger muscles and bones, better weight management, enhanced mental health, reduced risk of chronic diseases, improved sleep quality, and increased energy levels."
]
input_texts = queries + documents
max_length = 8192# Tokenize the input texts
batch_dict = tokenizer(input_texts,padding=True,truncation=True,max_length=max_length,return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)scores
left_paddingtensor([[0.6436, 0.1156],[0.2306, 0.7621]], grad_fn=<MmBackward0>)
参考链接:https://github.com/QwenLM/Qwen3-Embedding?tab=readme-ov-file