【腾讯拥抱开源】KaLM-Embedding-Gemma3-12B-2511:一款基于谷歌Gemma的特征提取模型
简介
KaLM-Embedding-Gemma3-12B-2511 是一款多功能且紧凑的嵌入模型,在 MMTEB 基准测试中(截至 2025 年 11 月)实现了最先进的性能表现。
MMTEB 评估结果
| Rank (Borda) | Model | Mean (Task) | Mean (TaskType) | Bitext Mining | Classification | Clustering | Instruction Reranking | Multilabel Classification | Pair Classification | Reranking | Retrieval | STS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | KaLM-Embedding-Gemma3-12B-2511 | 72.32 | 62.51 | 83.76 | 77.88 | 55.77 | 5.49 | 33.03 | 84.73 | 67.27 | 75.66 | 79.02 |
| 2 | llama-embed-nemotron-8b | 69.46 | 61.09 | 81.72 | 73.21 | 54.35 | 10.82 | 29.86 | 83.97 | 67.78 | 68.69 | 79.41 |
| 3 | Qwen3-Embedding-8B | 70.58 | 61.69 | 80.89 | 74.00 | 57.65 | 10.06 | 28.66 | 86.40 | 65.63 | 70.88 | 81.08 |
| 4 | gemini-embedding-001 | 68.37 | 59.59 | 79.28 | 71.82 | 54.59 | 5.18 | 29.16 | 83.63 | 65.58 | 67.71 | 79.40 |
| 5 | Qwen3-Embedding-4B | 69.45 | 60.86 | 79.36 | 72.33 | 57.15 | 11.56 | 26.77 | 85.05 | 65.08 | 69.60 | 80.86 |
| 6 | Qwen3-Embedding-0.6B | 64.34 | 56.01 | 72.23 | 66.83 | 52.33 | 5.09 | 24.59 | 80.83 | 61.41 | 64.65 | 76.17 |
| 7 | gte-Qwen2-7B-instruct | 62.51 | 55.93 | 73.92 | 61.55 | 52.77 | 4.94 | 25.48 | 85.13 | 65.55 | 60.08 | 73.98 |
| 8 | Linq-Embed-Mistral | 61.47 | 54.14 | 70.34 | 62.24 | 50.60 | 0.94 | 24.77 | 80.43 | 64.37 | 58.69 | 74.86 |
| 9 | multilingual-e5-large-instruct | 63.22 | 55.08 | 80.13 | 64.94 | 50.75 | -0.40 | 22.91 | 80.86 | 62.61 | 57.12 | 76.81 |
| 10 | embeddinggemma-300m | 61.15 | 54.31 | 64.40 | 60.90 | 51.17 | 5.61 | 24.82 | 81.40 | 63.25 | 62.49 | 74.73 |
模型详情
- 模型大小:117.6亿参数
- 嵌入维度:3840
- 最大输入标记数:3.2万
- 多分辨率编码维度:3840、2048、1024、512、256、128及64
- 池化方式:末端标记池化
使用指南
支持sentence-transformers
安装sentence-transformers库后即可便捷调用本模型:
pip install -U sentence-transformers
您可以这样使用该模型:
from sentence_transformers import SentenceTransformer
import torchmodel = SentenceTransformer("tencent/KaLM-Embedding-Gemma3-12B-2511",trust_remote_code=True,model_kwargs={"torch_dtype": torch.bfloat16,"attn_implementation": "flash_attention_2", # Optional},
)
model.max_seq_length = 512sentences = ["This is an example sentence", "Each sentence is converted"]
prompt = "Instruct: Classifying the category of french news.\nQuery:"
embeddings = model.encode(sentences,prompt=prompt,normalize_embeddings=True,batch_size=256,show_progress_bar=True,
)
print(embeddings)
'''
[[-0.01867676 0.02319336 0.00280762 ... -0.02075195 0.00196838-0.0703125 ][-0.0067749 0.03491211 0.01434326 ... -0.0043335 0.00509644-0.04174805]]
'''
或者您可以使用 encode_query 和 encode_document 分别自动为查询("指令:给定一个查询,检索回答该查询的文档\n查询:")和文档("")添加默认提示。
from sentence_transformers import SentenceTransformer
import torchmodel = SentenceTransformer("tencent/KaLM-Embedding-Gemma3-12B-2511",trust_remote_code=True,model_kwargs={"torch_dtype": torch.bfloat16,"attn_implementation": "flash_attention_2", # Optional},
)
model.max_seq_length = 512queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
'''
tensor([[0.9034, 0.2563],[0.3153, 0.7396]])
'''
vllm支持说明
注:由于vllm仅支持Gemma3ForCausalLM模型类,不支持Gemma3TextModel,因此必须通过revision="CausalLM"指定CausalLM分支来加载模型参数。
from vllm import LLMsentences = ["This is an example sentence", "Each sentence is converted"]# Create an LLM.
# You should pass task="embed" for embedding models
model = LLM(model="tencent/KaLM-Embedding-Gemma3-12B-2511",task="embed",enforce_eager=True,revision="CausalLM", # specify the CausalLM branch for Gemma3ForCausalLM config
)outputs = model.embed(sentences)
embeddings = [output.outputs.embedding for output in outputs]
引用
如果您觉得这个模型有用,请考虑给予星标和引用。
@misc{zhao2025kalmembeddingv2,title={KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model}, author={Xinping Zhao and Xinshuo Hu and Zifei Shan and Shouzheng Huang and Yao Zhou and Xin Zhang and Zetian Sun and Zhenyu Liu and Dongfang Li and Xinyuan Wei and Youcheng Pan and Yang Xiang and Meishan Zhang and Haofen Wang and Jun Yu and Baotian Hu and Min Zhang},year={2025},eprint={2506.20923},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2506.20923},
}@misc{hu2025kalmembedding,title={KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model}, author={Xinshuo Hu and Zifei Shan and Xinping Zhao and Zetian Sun and Zhenyu Liu and Dongfang Li and Shaolin Ye and Xinyuan Wei and Qian Chen and Baotian Hu and Haofen Wang and Jun Yu and Min Zhang},year={2025},eprint={2501.01028},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2501.01028},
}
