当前位置：首页 > news >正文

Data Whale

news 来源：原创 2025/7/1 6:28:22

构建RAG应用

将LLM接入Langchain

LangChain为基于LLM开发的自定义应用提供了高效的开发框架，内置多种大模型的接口

基于Langchain调用

LCEL （LangChain Expression Language）是一种新的语法

提供异步的批处理的流处理的支持，使得代码快速的移植
拥有后备措施，解决LLM格式输出问题
增加了LLm并行，提高了效率
内置了日志记录，使得代码变得复杂，有利于理解复杂的链条和代理的运行情况

构建检索问答链

将召回结构和query结合起来构建prompt，输入到大模型中进行问答

创建检索链

LCEL支持异步，流式，批处理等方式运行，使用LangSmith进行无缝跟踪

from langchain_core.runnables import RunnableLambda
def combine_docs(docs):return "\n\n".join(doc.page_content for doc in docs)combiner = RunnableLambda(combine_docs)
retrieval_chain = retriever | combinerretrieval_chain.invoke("南瓜书是什么？")

创建LLM

构建检索问答链

from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.output_parsers import StrOutputParsertemplate = """使用以下上下文来回答最后的问题。如果你不知道答案，就说你不知道，不要试图编造答
案。最多使用三句话。尽量使答案简明扼要。请你在回答的最后说“谢谢你的提问！”。
{context}
问题: {input}
"""
# 将template通过 PromptTemplate 转为可以在LCEL中使用的类型
prompt = PromptTemplate(template=template)qa_chain = (RunnableParallel({"context": retrieval_chain, "input": RunnablePassthrough()})| prompt| llm| StrOutputParser()
)

向检索链中添加聊天记录

问题：在构建应用程序的时候出现智能体不记得之前的交流的内容，解决方法，传递聊天记录

传递聊天记录：

将先前的对话嵌入到语言模型中，使其具有连续对话的能力。

from langchain_core.prompts import ChatPromptTemplate# 问答链的系统prompt
system_prompt = ("你是一个问答任务的助手。 ""请使用检索到的上下文片段回答这个问题。 ""如果你不知道答案就说不知道。 ""请使用简洁的话语回答用户。""\n\n""{context}"
)
# 制定prompt template
qa_prompt = ChatPromptTemplate([("system", system_prompt),("placeholder", "{chat_history}"),("human", "{input}"),]
)# 无历史记录
messages = qa_prompt.invoke({"input": "南瓜书是什么？","chat_history": [],"context": ""}
)
for message in messages.messages:print(message.content)
# 有历史记录
messages = qa_prompt.invoke({"input": "你可以介绍一下他吗？","chat_history": [("human", "西瓜书是什么？"),("ai", "西瓜书是指周志华老师的《机器学习》一书，是机器学习领域的经典入门教材之一。"),],"context": ""}
)
for message in messages.messages:print(message.content)

带有信息压缩的检索链

from langchain_core.runnables import RunnableBranch# 压缩问题的系统 prompt
condense_question_system_template = ("请根据聊天记录完善用户最新的问题，""如果用户最新的问题不需要完善则返回用户的问题。")
# 构造 压缩问题的 prompt template
condense_question_prompt = ChatPromptTemplate([("system", condense_question_system_template),("placeholder", "{chat_history}"),("human", "{input}"),])
# 构造检索文档的链
# RunnableBranch 会根据条件选择要运行的分支
retrieve_docs = RunnableBranch(# 分支 1: 若聊天记录中没有 chat_history 则直接使用用户问题查询向量数据库(lambda x: not x.get("chat_history", False), (lambda x: x["input"]) | retriever, ),# 分支 2 : 若聊天记录中有 chat_history 则先让 llm 根据聊天记录完善问题再查询向量数据库condense_question_prompt | llm | StrOutputParser() | retriever,
)

支持聊天记录的检索问答链

# 重新定义 combine_docs
def combine_docs(docs):return "\n\n".join(doc.page_content for doc in docs["context"]) # 将 docs 改为 docs["context"]
# 定义问答链
qa_chain = (RunnablePassthrough.assign(context=combine_docs) # 使用 combine_docs 函数整合 qa_prompt 中的 context| qa_prompt # 问答模板| llm| StrOutputParser() # 规定输出的格式为 str
)
# 定义带有历史记录的问答链
qa_history_chain = RunnablePassthrough.assign(context = (lambda x: x) | retrieve_docs # 将查询结果存为 content).assign(answer=qa_chain) # 将最终结果存为 answer

部署知识库助手

Streamlit可以在python中通过web界面演示机器学习模型，使用python接口快速实现演示目标

Streamlit检索简介

Streamlit 无需深入了解web开发，web框架，只需要编写普通的python板块。Streamlit提供了一组简单而强大的基础模块，用于构建数据应用程序：

st.write()：这是最基本的模块之一，用于在应用程序中呈现文本、图像、表格等内容。
st.title()、st.header()、st.subheader()：这些模块用于添加标题、子标题和分组标题，以组织应用程序的布局。
st.text()、st.markdown()：用于添加文本内容，支持 Markdown 语法。
st.image()：用于添加图像到应用程序中。
st.dataframe()：用于呈现 Pandas 数据框。
st.table()：用于呈现简单的数据表格。
st.pyplot()、st.altair_chart()、st.plotly_chart()：用于呈现 Matplotlib、Altair 或 Plotly 绘制的图表。
st.selectbox()、st.multiselect()、st.slider()、st.text_input()：用于添加交互式小部件，允许用户在应用程序中进行选择、输入或滑动操作。
st.button()、st.checkbox()、st.radio()：用于添加按钮、复选框和单选按钮，以触发特定的操作。