当前位置: 首页 > news >正文

全文检索官网示例

链接地址:https://milvus.io/docs/zh/full_text_search_with_milvus.md
full_text_demo:

from typing import List
from __init__ import openai_client
import sysfrom pymilvus import (MilvusClient,DataType,Function,FunctionType,AnnSearchRequest,RRFRanker,
)# Connect to Milvus:连接到 Milvus
uri = "http://ip:19530"
collection_name = "full_text_demo"
client = MilvusClient(uri=uri)
print("连接成功")# sys.exit()analyzer_params = {"tokenizer": "standard", "filter": ["lowercase"]}schema = MilvusClient.create_schema()
schema.add_field(field_name="id",datatype=DataType.VARCHAR,is_primary=True,auto_id=True,max_length=100,
)
schema.add_field(field_name="content",datatype=DataType.VARCHAR,max_length=65535,analyzer_params=analyzer_params,enable_match=True,  # Enable text matchingenable_analyzer=True,  # Enable text analysis
)
schema.add_field(field_name="sparse_vector", datatype=DataType.SPARSE_FLOAT_VECTOR)
schema.add_field(field_name="dense_vector",datatype=DataType.FLOAT_VECTOR,dim=1536,  # Dimension for text-embedding-3-small
)
schema.add_field(field_name="metadata", datatype=DataType.JSON)bm25_function = Function(name="bm25",function_type=FunctionType.BM25,input_field_names=["content"],output_field_names="sparse_vector",
)schema.add_function(bm25_function)# 创建索引
index_params = MilvusClient.prepare_index_params()
index_params.add_index(field_name="sparse_vector",index_type="SPARSE_INVERTED_INDEX",metric_type="BM25",
)
index_params.add_index(field_name="dense_vector", index_type="FLAT", metric_type="IP")if client.has_collection(collection_name):client.drop_collection(collection_name)
client.create_collection(collection_name=collection_name,schema=schema,index_params=index_params,
)
print(f"Collection '{collection_name}' created successfully")# openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
model_name = "text-embedding-3-small"def get_embeddings(texts: List[str]) -> List[List[float]]:if not texts:return []response = openai_client.embeddings.create(input=texts, model=model_name)return [embedding.embedding for embedding in response.data]# Define indexes
index_params = MilvusClient.prepare_index_params()
index_params.add_index(field_name="sparse_vector",index_type="SPARSE_INVERTED_INDEX",metric_type="BM25",
)
index_params.add_index(field_name="dense_vector", index_type="FLAT", metric_type="IP")# Drop collection if exist
if client.has_collection(collection_name):client.drop_collection(collection_name)
# Create the collection
client.create_collection(collection_name=collection_name,schema=schema,index_params=index_params,
)
print(f"Collection '{collection_name}' created successfully")# Set up OpenAI for embeddings
openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
model_name = "text-embedding-3-small"# Define embedding generation function for reuse
def get_embeddings(texts: List[str]) -> List[List[float]]:if not texts:return []response = openai_client.embeddings.create(input=texts, model=model_name)return [embedding.embedding for embedding in response.data]# Example documents to insert
documents = [{"content": "Milvus is a vector database built for embedding similarity search and AI applications.","metadata": {"source": "documentation", "topic": "introduction"},},{"content": "Full-text search in Milvus allows you to search using keywords and phrases.","metadata": {"source": "tutorial", "topic": "full-text search"},},{"content": "Hybrid search combines the power of sparse BM25 retrieval with dense vector search.","metadata": {"source": "blog", "topic": "hybrid search"},},
]# Prepare entities for insertion
entities = []
texts = [doc["content"] for doc in documents]
embeddings = get_embeddings(texts)for i, doc in enumerate(documents):entities.append({"content": doc["content"],"dense_vector": embeddings[i],"metadata": doc.get("metadata", {}),})# Insert data
client.insert(collection_name, entities)
print(f"Inserted {len(entities)} documents")# Example query for semantic search
query = "How does Milvus help with similarity search?"# Generate embedding for query
query_embedding = get_embeddings([query])[0]# Semantic search using dense vectors
results = client.search(collection_name=collection_name,data=[query_embedding],anns_field="dense_vector",limit=5,output_fields=["content", "metadata"],
)
dense_results = results[0]# Print results
print("\nDense Search (Semantic):")
for i, result in enumerate(dense_results):print(f"{i+1}. Score: {result['distance']:.4f}, Content: {result['entity']['content']}")
http://www.dtcms.com/a/299239.html

相关文章:

  • 算法竞赛阶段二-数据结构(35)数据结构单链表模拟实现
  • springboot + vue3 拉取海康视频点位及播放
  • Kafka——Java消费者是如何管理TCP连接的?
  • JavaWeb01——基础标签及样式(黑马视频笔记)
  • [2025CVPR:图象合成、生成方向]WF-VAE:通过小波驱动的能量流增强视频 VAE 的潜在视频扩散模型
  • SSRF_XXE_RCE_反序列化学习
  • 「iOS」——内存五大分区
  • C++核心编程学习--对象特性--对象模型和this指针
  • 旧设备HMI焕新陷阱:操作习惯继承与智能化升级的平衡点把控
  • ​机器学习从入门到实践:算法、特征工程与模型评估详解
  • pose调研
  • # JsSIP 从入门到实战:构建你的第一个 Web 电话
  • Vue》》@ 用法
  • 期货资管软件定制开发流程
  • Matlab学习笔记:自定义函数
  • Vue 3 与 Element Plus 中的 /deep/ 选择器问题
  • 如果在分支A上修改了内容,想要提交更新内容的话,如何与develop上的主分支的最新的代码拉齐
  • linux线程概念和控制
  • Node.js特训专栏-实战进阶:19.dotenv环境变量管理
  • 零基础学习性能测试第三章:jmeter构建性能业务场景
  • [C/C++内存安全]_[中级]_[再次探讨避免悬垂指针的方法和检测空指针的方法]
  • 《从零开始学 JSSIP:JavaScript 实时通信开发实战》
  • QT核心————信号槽
  • Qt 多线程编程最佳实践
  • 《使用Qt Quick从零构建AI螺丝瑕疵检测系统》——6. 传统算法实战:用OpenCV测量螺丝尺寸
  • 基于粒子群算法优化高斯过程回归(PSO-GPR)的多输出回归
  • 数据科学与大数据技术专业的核心课程体系及发展路径全解析
  • Jenkins运行pytest时指令失效的原因以及解决办法
  • Java集合体系详解
  • docker常用命令集(3)