当前位置: 首页 > news >正文

python访问基于docker搭建的elasticsearch

假设已基于docker搭建elasticsearch,详细过程参考如下链接。

https://blog.csdn.net/liliang199/article/details/151581138

这里先验证ES运行是否正常,然后导入向量数据,并示例查询过程。

1 安装验证

设置ES密码环境变量

export ELASTIC_PASSWORD=xxxxxx

如果忘记密码,可以重新生成一份

docker exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

把证书http_ca.crt从容器复制一份到宿主机,保存在当前目录,路径为"./http_ca.crt"

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

运行命令查看restful api是否运行正常

curl --cacert http_ca.crt -u elastic:$ELASTIC_PASSWORD "https://localhost:9200"

如果收到如下输出,说明ES运行正常

{
  "name" : "xxxf4ac4xxx",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "xxxx1x9j3qQ_xxx",
  "version" : {
    "number" : "9.1.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "dasdfa1a2f57de895a73a1391ff8426c0153c8d",
    "build_date" : "2025-05-24T22:05:04.526302670Z",
    "build_snapshot" : false,
    "lucene_version" : "10.2.2",
    "minimum_wire_compatibility_version" : "8.19.0",
    "minimum_index_compatibility_version" : "8.0.0"
  },
  "tagline" : "You Know, for Search"
}

2 python连接

pip安装elasticsearch包

pip install elasticsearch

python访问ES代码如下所示,证书文件“./http_ca.crt'”在当前目录。

from elasticsearch import Elasticsearch
import ssl
import asynciossl_context = ssl.create_default_context(cafile='./http_ca.crt')
es = Elasticsearch(['https://localhost:9200'],http_auth=('elastic', 'passwd‘),# scheme="https",ssl_context=ssl_context
)def main():info = es.info()print(info)# 运行主函数
main()

输出如下,说明ES运行正常。

{'name': 'c2542f4ac460', xxxx 'tagline': 'You Know, for Search'}

3 建立索引

建立search-ahz2索引,包含两个字段: vector,text,代码如下。

from elasticsearch import Elasticsearchclient = Elasticsearch(['https://localhost:9200'],http_auth=('elastic', 'passwd'),# scheme="https",ssl_context=ssl_context
)index_name = "search-ahz2"mappings = {"properties": {"vector": {"type": "dense_vector","dims": 3},"text": {"type": "text"}}
}mapping_response = client.indices.put_mapping(index=index_name, body=mappings)
print(mapping_response)

输出如下,说明索引正确建立

{'acknowledged': True}

4 数据导入

数据导入代码如下所示

from elasticsearch import Elasticsearch, helpersclient = Elasticsearch(['https://localhost:9200'],http_auth=('elastic', 'passwd'),ssl_context=ssl_context
)index_name = "search-ahz2"docs = [{"text": "Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site.","vector": [4.667,0.145,3.07]},{"text": "Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face.","vector": [4.222,3.251,6.634]},{"text": "Rocky Mountain National Park  is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site.","vector": [6.504,9.081,0.003]}
]
bulk_response = helpers.bulk(client, docs, index=index_name)
print(bulk_response)

5 数据查询

基于向量"query_vector": [-5, 9, -12],采用最近邻knn方式查询,代码示例如下。

from elasticsearch import Elasticsearchssl_context = ssl.create_default_context(cafile='./http_ca.crt')client = Elasticsearch(['https://localhost:9200'],http_auth=('elastic', 'A2x92P-rsDoCxyTuFet='),ssl_context=ssl_context
)retriever_object = {"standard": {"query": {"knn": {"field": "vector","query_vector": [-5, 9, -12],"num_candidates": 100,"k": 1}}}
}search_response = client.search(index="search-ahz2",retriever=retriever_object,
)
print(search_response['hits']['hits'])

结果如下所示。

[{'_index': 'search-ahz2', '_id': '1hF_OJkBGOYBI6JUku22', '_score': 0.7304765, '_source': {'text': 'Rocky Mountain National Park  is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site.', 'vector': [6.504, 9.081, 0.003]}}]

reference

---

docker搭建elasticsearch并使用python连接

https://zhuanlan.zhihu.com/p/660380507

详细教程:如何使用elasticsearch 8.x进行向量搜索

https://blog.csdn.net/qq_50790981/article/details/140336167

kNN 搜索

https://elastic.ac.cn/docs/solutions/search/vector/knn


文章转载自:

http://HdzN3Zi1.hjrjr.cn
http://LJDyxd4X.hjrjr.cn
http://zC9ZaK51.hjrjr.cn
http://3RXGqyXo.hjrjr.cn
http://HcLypFaB.hjrjr.cn
http://oHZeP9AA.hjrjr.cn
http://tyiBhINx.hjrjr.cn
http://iHoR1keI.hjrjr.cn
http://tzQHp1nf.hjrjr.cn
http://lfat8KAI.hjrjr.cn
http://MRvRIlqr.hjrjr.cn
http://R1ttBamj.hjrjr.cn
http://COryulJ9.hjrjr.cn
http://XmFIJyXv.hjrjr.cn
http://oWMs4NMp.hjrjr.cn
http://6F1pLW1O.hjrjr.cn
http://TCLahdyK.hjrjr.cn
http://YWEskuBR.hjrjr.cn
http://ih8HmWhc.hjrjr.cn
http://UQKwEdxj.hjrjr.cn
http://3z1VHdQC.hjrjr.cn
http://nHoyctfE.hjrjr.cn
http://lb3Hk9n0.hjrjr.cn
http://k45eKQUJ.hjrjr.cn
http://t9REWMbo.hjrjr.cn
http://UUHfDXQU.hjrjr.cn
http://fHsBl9aX.hjrjr.cn
http://ttXCaHwW.hjrjr.cn
http://TVynUBPU.hjrjr.cn
http://UgEohbBY.hjrjr.cn
http://www.dtcms.com/a/379120.html

相关文章:

  • logback-spring.xml文件说明
  • 【PyTorch训练】为什么要有 loss.backward() 和 optimizer.step()?
  • 抖音大数据开发一面(0905)
  • 原生js的轮播图
  • 连接池项目考点
  • ruoyi-flowable-plus框架节点表单的理解
  • js.228汇总区间
  • BERT中文预训练模型介绍
  • 光平面标定建立激光点与世界坐标的对应关系
  • Jmeter执行数据库操作
  • 基于FPGA的图像中值滤波算法Verilog开发与开发板硬件测试
  • 微软Aurora大模型实战:五大数据源驱动、可视化对比与应用
  • 【论文笔记】SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection
  • C++基本数据类型的范围
  • Spring AI(三)多模态支持(豆包)
  • agentic Deep search相关内容补充
  • 第一篇:如何在数组中操作数据【数据结构入门】
  • PYcharm——pyqt音乐播放器
  • OpenAI已正式开放ChatGPT Projects
  • 日系电车销量破万,真正突围了,恰恰说明了电车的组装本质!
  • Linux 防火墙 Iptables
  • 不想考地信,计算机又太卷,所以转型GIS开发
  • PotPlayer 1.7.22611发布:支持蓝光播放+智能字幕匹配
  • LVS负载均衡群集与Keepalived高可用
  • React中hook的用法及例子(持续更新)
  • 【网络编程】TCP、UDP、KCP、QUIC 全面解析
  • 【1】占位符
  • A2A 中的内存共享方法
  • 力扣704. 二分查找
  • HttpServletRequest vs ServletContext 全面解析