当前位置: 首页 > news >正文

LLM + RAG + Vanna 综合实战

这篇文章介绍如何将LLM + RAG + Vanna结合起来,在一个项目中通过和LLM对话的方式查询数据库中的记录,并显示结果。而不需要人为的手动的通过执行SQL的方式来查询数据库记录。


OLLAMA

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Note: The default location to store models file is: /usr/share/ollama/.ollama/models, to change the location, refer to: https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored

Query Libraries:
library

Config proxy

sudo vi /etc/systemd/system/ollama.service

如果需要,增加http_proxy和https_proxy,

[Unit]
Description=Ollama Service
After=network-online.target[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/home/sunny/bin:/home/sunny/conda/condabin:/home/sunny/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/sunny/yunjiang/tools/apache-maven-3.9.9/bin:/home/sunny/conda/bin:/home/sunny/conda/bin"Environment="http_proxy=http://child-prc.intel.com:913"
Environment="https_proxy=http://child-prc.intel.com:913"[Install]
WantedBy=default.target

重启服务,

sudo systemctl daemon-reload

sudo systemctl restart ollama

Pull model

ollama pull llama3.2

List model

ollama list

Run model

ollama run llama3.2

Ask Question via Restful API

curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "How are you today?", "stream": false}'

Install CrhomaDB

Refer to: https://blog.csdn.net/funnyrand/article/details/148132003

Install Vanna

Install Vanna

pip3 install vanna

For more details of Vanna, refer to:

Vanna.AI Documentation

https://github.com/vanna-ai/vanna?tab=readme-ov-file

Install ClickHouse Connect

pip3 install clickhouse_connect

Install MySQL Connect

pip3 install PyMySQL

Install ClickHouse

Refer to: ClickHouse总结_checkhouse数据库-CSDN博客

注:需要插入真实数据到ClickHouse,下面的部分仅做演示。

Python code to use Vanna

ntr_test1.py

from vanna.ollama import Ollama
from vanna.chromadb import ChromaDB_VectorStoreclass MyVanna(ChromaDB_VectorStore, Ollama):def __init__(self, config=None):ChromaDB_VectorStore.__init__(self, config=config)Ollama.__init__(self, config=config)def train():# The information schema query may need some tweaking depending on your database. This is a good starting point.df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE table_schema = 'ntr'")# This will break up the information schema into bite-sized chunks that can be referenced by the LLMplan = vn.get_training_plan_generic(df_information_schema)#print(plan)# If you like the plan, then uncomment this and run it to train#vn.train(plan=plan)# The following are methods for adding training data. Make sure you modify the examples to match your database.# DDL statements are powerful because they specify table names, colume names, types, and potentially relationshipsvn.train(ddl="""CREATE TABLE ntr.metadata (timestamp DateTime64(6),srcMac FixedString(17),destMac FixedString(17),srcIp IPv4,destIp IPv4,srcPort UInt16,destPort UInt16,etherType UInt16,ipProto UInt16,packetSize UInt16,offset UInt64,pcapFile String)ENGINE = MergeTree()PARTITION BY toStartOfHour(timestamp)ORDER BY (timestamp, srcIp, destIp, srcPort, destPort, etherType, ipProto);""")vn.train(ddl="""CREATE TABLE ntr.metadata_stats_all (timestamp DateTime,throughput Float64,packetCount UInt64,packetSizes UInt64)ENGINE = MergeTree()PARTITION BY toYYYYMMDD(timestamp)ORDER BY (timestamp);""")vn.train(ddl="""CREATE TABLE ntr.global_config (key String,value String,initVal String,regexp String) ENGINE = MergeTree()PARTITION BY (key)ORDER BY (key);""")# Sometimes you may want to add documentation about your business terminology or definitions.# vn.train(documentation="Our business defines OTIF score as the percentage of orders that are delivered on time and in full")# You can also add SQL queries to your training data. This is useful if you have some queries already laying around. You can just copy and paste those from your editor to begin generating new SQL.#vn.train(sql="select * from mydb.person where name='sunny';")# At any time you can inspect what training data the package is able to referencetraining_data = vn.get_training_data()training_data# Main
vn = MyVanna(config={'model': 'llama3.2'})vn.connect_to_clickhouse(host='172.17.28.100', dbname='ntr', user="default", password="changeme", port=8123)#train()# You can remove training data if there's obsolete/incorrect information. 
# vn.remove_training_data(id='1-ddl')## Asking the AI
# Whenever you ask a new question, it will find the 10 most relevant pieces of training data and use it as part of the LLM prompt to generate the SQL.# vn.ask(question="How many metadata records with src IP address 117.35.136.101 in ntr?", allow_llm_to_see_data=True)
# vn.ask(question="Query metadata which src IP address is 117.35.136.101.")
# rt = vn.ask(question="How many items in table ntr.metadata witch src IP address 117.35.136.101?", visualize=True, allow_llm_to_see_data=True)
rt = vn.ask(question="Query items in ntr.metadata table which src IP address is 117.35.136.101.", visualize=False, allow_llm_to_see_data=True)
#print(rt)

运行结果:

sunny@r05u33-nex-wvie-spr-cd:~/work/llm$  /usr/bin/env /bin/python3 /home/sunny/.vscode-server/extensions/ms-python.debugpy-2025.8.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 39887 -- /home/sunny/work/llm/vanna/ntr/ntr_test1.py 
<clickhouse_connect.driver.httpclient.HttpClient object at 0x7f1b477f29e0>
SQL Prompt: [{'role': 'system', 'content': "You are a SQL expert. Please help to generate a SQL query to answer the question. Your response should ONLY be based on the given context and follow the response guidelines and format instructions. \n===Tables \n\n        CREATE TABLE ntr.metadata (\n            timestamp DateTime64(6),\n            srcMac FixedString(17),\n            destMac FixedString(17),\n            srcIp IPv4,\n            destIp IPv4,\n            srcPort UInt16,\n            destPort UInt16,\n            etherType UInt16,\n            ipProto UInt16,\n            packetSize UInt16,\n            offset UInt64,\n            pcapFile String\n        )\n        ENGINE = MergeTree()\n        PARTITION BY toStartOfHour(timestamp)\n        ORDER BY (\n            timestamp, \n            srcIp, \n            destIp, \n            srcPort, \n            destPort, \n            etherType, \n            ipProto\n        );\n    \n\n\n        CREATE TABLE ntr.metadata_stats_all (\n            timestamp DateTime,\n            throughput Float64,\n            packetCount UInt64,\n            packetSizes UInt64\n        )\n        ENGINE = MergeTree()\n        PARTITION BY toYYYYMMDD(timestamp)\n        ORDER BY (timestamp);\n    \n\n\n        CREATE TABLE ntr.global_config (\n            key String,\n            value String,\n            initVal String,\n            regexp String\n        ) \n        ENGINE = MergeTree()\n        PARTITION BY (key)\n        ORDER BY (key);\n    \n\n===Response Guidelines \n1. If the provided context is sufficient, please generate a valid SQL query without any explanations for the question. \n2. If the provided context is almost sufficient but requires knowledge of a specific string in a particular column, please generate an intermediate SQL query to find the distinct strings in that column. Prepend the query with a comment saying intermediate_sql \n3. If the provided context is insufficient, please explain why it can't be generated. \n4. Please use the most relevant table(s). \n5. If the question has been asked and answered before, please repeat the answer exactly as it was given before. \n6. Ensure that the output SQL is SQL-compliant and executable, and free of syntax errors. \n"}, {'role': 'user', 'content': 'Query items in ntr.metadata table which src IP address is 117.35.136.101.'}, {'role': 'assistant', 'content': "SELECT * \nFROM ntr.metadata\nWHERE srcIp = '117.35.136.101'"}, {'role': 'user', 'content': 'Query metadata which src port is 61446'}, {'role': 'assistant', 'content': 'SELECT * FROM ntr.metadata WHERE srcPort = 61446'}, {'role': 'user', 'content': 'Query metadata which src port is 61446'}, {'role': 'assistant', 'content': 'SELECT DISTINCT srcPort\nFROM ntr.metadata\nWHERE srcPort = 61446'}, {'role': 'user', 'content': 'How many items in table ntr.metadata witch src IP address 117.35.136.101?'}, {'role': 'assistant', 'content': "SELECT COUNT(*) \nFROM ntr.metadata\nWHERE srcIp = '117.35.136.101'"}, {'role': 'user', 'content': 'How many items in table ntr.metadata?'}, {'role': 'assistant', 'content': 'SELECT COUNT(*) \nFROM ntr.metadata'}, {'role': 'user', 'content': 'Query items in ntr.metadata table which src IP address is 117.35.136.101.'}]
Info: Ollama parameters:
model=llama3.2:latest,
options={},
keep_alive=None
Info: Prompt Content:
[{"role": "system", "content": "You are a SQL expert. Please help to generate a SQL query to answer the question. Your response should ONLY be based on the given context and follow the response guidelines and format instructions. \n===Tables \n\n        CREATE TABLE ntr.metadata (\n            timestamp DateTime64(6),\n            srcMac FixedString(17),\n            destMac FixedString(17),\n            srcIp IPv4,\n            destIp IPv4,\n            srcPort UInt16,\n            destPort UInt16,\n            etherType UInt16,\n            ipProto UInt16,\n            packetSize UInt16,\n            offset UInt64,\n            pcapFile String\n        )\n        ENGINE = MergeTree()\n        PARTITION BY toStartOfHour(timestamp)\n        ORDER BY (\n            timestamp, \n            srcIp, \n            destIp, \n            srcPort, \n            destPort, \n            etherType, \n            ipProto\n        );\n    \n\n\n        CREATE TABLE ntr.metadata_stats_all (\n            timestamp DateTime,\n            throughput Float64,\n            packetCount UInt64,\n            packetSizes UInt64\n        )\n        ENGINE = MergeTree()\n        PARTITION BY toYYYYMMDD(timestamp)\n        ORDER BY (timestamp);\n    \n\n\n        CREATE TABLE ntr.global_config (\n            key String,\n            value String,\n            initVal String,\n            regexp String\n        ) \n        ENGINE = MergeTree()\n        PARTITION BY (key)\n        ORDER BY (key);\n    \n\n===Response Guidelines \n1. If the provided context is sufficient, please generate a valid SQL query without any explanations for the question. \n2. If the provided context is almost sufficient but requires knowledge of a specific string in a particular column, please generate an intermediate SQL query to find the distinct strings in that column. Prepend the query with a comment saying intermediate_sql \n3. If the provided context is insufficient, please explain why it can't be generated. \n4. Please use the most relevant table(s). \n5. If the question has been asked and answered before, please repeat the answer exactly as it was given before. \n6. Ensure that the output SQL is SQL-compliant and executable, and free of syntax errors. \n"}, {"role": "user", "content": "Query items in ntr.metadata table which src IP address is 117.35.136.101."}, {"role": "assistant", "content": "SELECT * \nFROM ntr.metadata\nWHERE srcIp = '117.35.136.101'"}, {"role": "user", "content": "Query metadata which src port is 61446"}, {"role": "assistant", "content": "SELECT * FROM ntr.metadata WHERE srcPort = 61446"}, {"role": "user", "content": "Query metadata which src port is 61446"}, {"role": "assistant", "content": "SELECT DISTINCT srcPort\nFROM ntr.metadata\nWHERE srcPort = 61446"}, {"role": "user", "content": "How many items in table ntr.metadata witch src IP address 117.35.136.101?"}, {"role": "assistant", "content": "SELECT COUNT(*) \nFROM ntr.metadata\nWHERE srcIp = '117.35.136.101'"}, {"role": "user", "content": "How many items in table ntr.metadata?"}, {"role": "assistant", "content": "SELECT COUNT(*) \nFROM ntr.metadata"}, {"role": "user", "content": "Query items in ntr.metadata table which src IP address is 117.35.136.101."}]
Info: Ollama Response:
model='llama3.2:latest' created_at='2025-05-30T08:21:59.889943887Z' done=True done_reason='stop' total_duration=8834583846 load_duration=6108105509 prompt_eval_count=679 prompt_eval_duration=2053141142 eval_count=22 eval_duration=651663457 message=Message(role='assistant', content="SELECT * \nFROM ntr.metadata\nWHERE srcIp = '117.35.136.101'", images=None, tool_calls=None)
LLM Response: SELECT * 
FROM ntr.metadata
WHERE srcIp = '117.35.136.101'
SELECT * 
FROM ntr.metadata
WHERE srcIp = '117.35.136.101'timestamp                srcMac               destMac           srcIp  ... ipProto packetSize       offset                      pcapFile
0 2025-05-23 14:56:35.953557+08:00  b'e5:19:63:7b:de:b9'  b'55:61:5b:5e:74:de'  117.35.136.101  ...      17       1514  56228199546  output_0_4_5_1747983214.pcap[1 rows x 14 columns]

可以看到,其正确的输出了查询结果。

相关文章:

  • 移动端图片浏览插件
  • 机器视觉视觉中的棋盘格到底是什么?为什么是棋盘格?
  • python训练 60天挑战-day40
  • 在Mathematica中使用WhenEvent求解微分方程
  • 【数据库】并发控制
  • shell脚本打包成可以在麒麟桌面操作系统上使用的deb包
  • leetcode:479. 最大回文数乘积(python3解法,数学相关算法题)
  • 第十九章 正则表达式
  • 【Web应用】若依框架:基础篇12 项目结构
  • Linux 的主要时钟类型
  • 运行python文件规范日志
  • 开发体育平台,怎么接入最合适的数据接口
  • Display Driver Uninstaller(DDU卸载显卡驱动工具)官网下载
  • element上传文件多选 实现文件排序
  • GROMACS 软件包介绍与使用指南
  • LangChain-LangGraph框架 应用实例
  • Catch That Cow POJ - 3278
  • java代码性能优化
  • 什么是Docker容器?
  • 初探Linux内核:解锁Linux操作系统的基本核心的奥秘(二)
  • 视频剪辑培训班的学费是多少/百度地图优化排名方法
  • 太原网站建设51sole/推广产品最好的方式
  • 建立网站的方式/查淘宝关键词排名软件
  • 河南专业网站建设公司排名/seo就是搜索引擎广告
  • 做视频用的网站有哪些/核心关键词和长尾关键词举例
  • 关于营销方面的网站/查图百度识图