当前位置：首页 > news >正文

提升 RAG 检索质量的 “七种武器”

news 来源：原创 2025/6/23 9:19:52

在构建强大的检索增强生成 (RAG) 系统时，我们经常会遇到这样的挑战：用户的原始查询有时不够清晰、过于宽泛，或者隐含了多层含义，导致直接使用这些查询进行文档检索，往往无法命中真正相关的知识，最终影响生成答案的质量。这时，“查询转换” 就如同我们手中的“七种武器”，能够有效地弥补原始查询的不足，显著提升 RAG 的检索精度和召回率。

本文将深入探讨几种关键的查询转换技术，包括重写-检索-阅读、多查询生成 (Multi-Query)、步退提示和假设性文档嵌入 (HyDE)。结合 Langchain 这一强大的框架，提供可直接运行的 Python 实战代码，轻松将这些策略集成到RAG 管道中，打造更智能的知识检索系统。

RAG 查询的挑战：为何需要转换？

在深入查询转换的具体方法之前，让我们先理解为何需要对用户输入的原始查询进行改造：

模糊性：用户可能使用不明确的词语或指代不明的代词，导致检索系统难以确定其真实意图。
复杂性：一个查询可能包含多个子问题或需要跨多个知识点进行推理才能解答。
上下文缺失：孤立的查询可能缺乏必要的背景信息，使得检索系统难以判断哪些文档是真正相关的。
关键词不精确：用户使用的关键词可能与知识库中的术语不完全匹配，导致遗漏相关信息。

查询转换的目标正是通过各种策略，将原始查询转化为更清晰、更精确、更能代表用户真实意图的形式，从而提高后续检索步骤的效率和准确性。

基于Langchain实战篇

使用 LangChain 框架，并创建一个小型的内存向量数据库（使用 FAISS），这样无需连接外部数据库即可直接运行和测试所有代码。

准备工作：环境设置与数据准备

在开始之前，请确保已经安装了所有必要的库，并设置好的 OpenAI API 密钥。

pip install langchain langchain-openai faiss-cpu tiktoken

接下来，是所有示例都会用到的通用设置部分。我们先创建一个简单的 RAG 管道作为基线，以便后续进行对比。

通用设置代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate# --- 1. 设置 API 密钥 ---
# 建议使用环境变量来管理 API 密钥
os.environ["OPENAI_API_KEY"] = "sk-..."  # 替换OpenAI API Key# --- 2. 准备文档数据 ---
# 在真实场景中，会使用 load_data() 等方法从文件加载
documents_text = ["The 2024 Summer Olympics, officially known as the Games of the XXXIII Olympiad, were held in Paris, France.","Paris's successful bid for the 2024 Games was announced in 2017. Key factors included its existing world-class venues and strong public support.","The main opening ceremony of the Paris 2024 Olympics was unique, taking place along the River Seine.","The budget for the Paris 2024 Olympics was a major topic, with organizers focusing on using 95% existing or temporary infrastructure to control costs.","Sustainability was a core pillar of the Paris 2024 Games, with a goal to halve the carbon footprint compared to previous games.","Thomas Bach is the President of the International Olympic Committee (IOC) and oversaw the Paris 2024 Games.","A major security challenge for the Paris organizers was securing the open-air opening ceremony on the Seine."
]# --- 3. 创建向量数据库 ---
# 使用 LangChain 的文本分割器
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)# 使用 OpenAI 的嵌入模型
embeddings = OpenAIEmbeddings()# 使用 FAISS 创建内存向量存储
vectorstore = FAISS.from_documents(texts, embeddings)
retriever = vectorstore.as_retriever()# --- 4. 创建一个基础的 RAG 查询引擎 ---
# 这是我们的基线，用于对比
base_rag_chain = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0),chain_type="stuff",retriever=retriever,return_source_documents=True
)# 测试一下基线模型
print("--- 基线 RAG 测试 ---")
base_response = base_rag_chain.invoke({"query": "tell me about the olympics"})
print(f"查询: tell me about the olympics")
print(f"回答: {base_response['result']}")
print("-" * 50)

“武器一”：重写-检索-阅读 (Rewrite-Retrieve-Read)

场景：用户的查询非常口语化或模糊，例如 “what about the games in france?”。直接检索可能效果不佳。我们先让 LLM 将其重写为更精确的查询。

原理：
该策略利用大语言模型 (LLM) 的自然语言理解能力，对原始查询进行语义增强和细节补充。首先通过 LLM 生成更精准的查询表述，解决原始查询中的模糊性和信息缺失问题；然后使用优化后的查询进行文档检索，确保检索关键词与知识库术语匹配；最后将检索结果与原始查询共同输入 LLM 生成答案，既利用精确检索结果，又保留用户原始意图。

实战代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate# --- 1. 通用设置 (同上) ---
os.environ["OPENAI_API_KEY"] = "sk-..." # 替换 OpenAI API Key
documents_text = ["The 2024 Summer Olympics, officially known as the Games of the XXXIII Olympiad, were held in Paris, France.","Paris's successful bid for the 2024 Games was announced in 2017.","The main opening ceremony of the Paris 2024 Olympics was unique, taking place along the River Seine.","The budget for the Paris 2024 Olympics was a major topic, focusing on using existing infrastructure.","Sustainability was a core pillar of the Paris 2024 Games."
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
retriever = vectorstore.as_retriever()
llm = OpenAI(temperature=0)# --- 2. 定义查询重写链 ---
rewrite_template = """
### 指令 ###
你是一个查询优化专家。你的任务是将用户输入的、可能模糊不清的查询，重写为一个针对向量数据库检索更优化、更清晰具体的查询。
请只返回重写后的查询，不要包含任何其他解释或前缀。原始查询: {query}
优化后的查询:"""rewrite_prompt = PromptTemplate(input_variables=["query"], template=rewrite_template)
query_rewrite_chain = LLMChain(llm=llm, prompt=rewrite_prompt)# --- 3. 定义 RAG 链 ---
rag_chain = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=retriever,return_source_documents=True
)# --- 4. 执行并对比 ---
original_query = "what about the games in france?"# 4.1 直接使用原始查询
print("--- [方法一] 使用原始查询进行检索 ---")
original_response = rag_chain.invoke({"query": original_query})
print(f"原始查询: '{original_query}'")
print(f"检索到的文档数量: {len(original_response['source_documents'])}")
print(f"回答: {original_response['result'].strip()}")
print("-" * 50)# 4.2 使用重写后的查询
print("\n--- [方法二] 使用 '重写-检索-阅读' 策略 ---")
# 第一步：重写查询
rewritten_query = query_rewrite_chain.invoke({"query": original_query})['text'].strip()
print(f"原始查询: '{original_query}'")
print(f"LLM 重写后的查询: '{rewritten_query}'")# 第二步：使用重写后的查询进行检索和生成
# 注意：我们将重写后的查询用于检索，但为了让LLM理解完整上下文，最终生成答案时仍可将原始查询传入
# LangChain的RetrievalQA会自动处理，我们这里为了清晰，传入重写的查询
rewritten_response = rag_chain.invoke({"query": rewritten_query})
print(f"检索到的文档数量: {len(rewritten_response['source_documents'])}")
print(f"回答: {rewritten_response['result'].strip()}")
print("-" * 50)

“武器二”：多查询生成 (Multi-Query)

场景：用户的查询比较复杂，可能包含多个子问题。例如：“What were the main challenges and key success factors for the Paris Olympics?”。

原理：针对复杂查询的多维度信息需求，利用 LLM 将原始查询分解为多个子查询，每个子查询聚焦特定方面（如定义、应用、优缺点等）。通过对每个子查询独立检索，能够覆盖更广泛的相关文档，避免单一查询导致的信息遗漏。最终合并所有检索结果，为 LLM 提供多角度的上下文信息，提升答案的全面性。

实战代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
import logging# 配置日志，以便观察 MultiQueryRetriever 生成的子查询
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)# --- 1. 通用设置 ---
os.environ["OPENAI_API_KEY"] = "sk-..." # 替换 OpenAI API Key
documents_text = ["The 2024 Summer Olympics were held in Paris, France.","Paris's successful bid for the 2024 Games was announced in 2017. Key factors included its existing world-class venues and strong public support.","The budget for the Paris 2024 Olympics was a major topic, with organizers focusing on using 95% existing or temporary infrastructure to control costs.","Sustainability was a core pillar of the Paris 2024 Games, with a goal to halve the carbon footprint.","A major security challenge for the Paris organizers was securing the open-air opening ceremony on the Seine.","The unique opening ceremony along the River Seine was considered a huge success, boosting the city's image."
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
llm = OpenAI(temperature=0)# --- 2. 创建 MultiQueryRetriever ---
# LangChain 内置了 MultiQueryRetriever，极大地简化了操作
# 它会自动为你创建提示、调用LLM、生成多个查询并进行检索
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(), llm=llm
)# --- 3. 创建 RAG 链 ---
# 我们将使用新的 retriever 来构建 RAG 链
multi_query_rag_chain = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=multi_query_retriever,return_source_documents=True
)# --- 4. 执行查询 ---
complex_query = "What were the main challenges and key success factors for the Paris Olympics?"
print(f"--- 使用 '多查询生成' 策略 ---")
print(f"复杂查询: '{complex_query}'")# 执行链，日志中会打印出LLM生成的子查询
response = multi_query_rag_chain.invoke({"query": complex_query})print("\n--- 结果 ---")
print(f"检索到的文档数量: {len(response['source_documents'])}")
# 我们可以看到检索到的文档同时包含了 challenge 和 success 的内容
print("部分源文档内容:")
for doc in response['source_documents']:print(f"  - {doc.page_content[:100]}...")print(f"\n最终回答: {response['result'].strip()}")
print("-" * 50)

“武器三”：步退提示 (Step-Back Prompting)

场景：用户的问题非常具体，但需要更广泛的背景知识才能很好地回答。例如：“How was the security for the Seine opening ceremony at the Paris 2024 Olympics handled?”，直接搜可能没有文档完全匹配，但搜“巴黎奥运会的安保挑战”能提供很好的背景。

原理：通过引导 LLM 先思考更抽象的背景问题，建立原始查询的知识框架。首先生成与原始问题相关的基础概念查询（如从具体问题 “iPhone 15 的电池寿命如何” 退化为 “智能手机电池寿命的影响因素”），检索背景知识；然后结合原始问题进行二次检索，将基础概念与具体场景结合。这种分层检索策略为 LLM 提供更丰富的上下文，尤其适用于需要跨领域知识或因果推理的问题。

完整可执行代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from operator import itemgetter# --- 1. 通用设置 ---
os.environ["OPENAI_API_KEY"] = "sk-..." # OpenAI API Key
documents_text = ["The 2024 Summer Olympics were held in Paris, France.","The main opening ceremony of the Paris 2024 Olympics was unique, taking place along the River Seine.","A major security challenge for the Paris organizers was securing the open-air opening ceremony on the Seine.","France deployed tens of thousands of police and military personnel to ensure security during the games.","General security measures for large public events in France include surveillance, crowd control, and anti-terror units."
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
retriever = vectorstore.as_retriever()
llm = OpenAI(temperature=0)# --- 2. 定义“步退”问题生成链 ---
step_back_template = """
### 指令 ###
你是一个善于思考和推理的AI。请根据用户提出的具体问题，生成一个更通用、更抽象的“步退”问题。这个“步退”问题有助于我们先检索到相关的背景知识。
请只返回“步退”问题，不要包含任何其他解释。原始问题: {query}
“步退”问题:"""
step_back_prompt = PromptTemplate(input_variables=["query"], template=step_back_template)
step_back_chain = LLMChain(llm=llm, prompt=step_back_prompt)# --- 3. 执行并组合 ---
specific_query = "How was the security for the Seine opening ceremony at the Paris 2024 Olympics handled?"print("--- [方法一] 直接使用具体问题检索 ---")
# 直接检索可能找不到同时包含"security", "Seine", "handled"的完美匹配文档
direct_docs = retriever.invoke(specific_query)
print(f"具体查询: '{specific_query}'")
print(f"直接检索到的文档数量: {len(direct_docs)}")
for doc in direct_docs:print(f"  - {doc.page_content}")
print("-" * 50)print("\n--- [方法二] 使用 '步退提示' 策略 ---")
# 第一步: 生成“步退”问题
step_back_response = step_back_chain.invoke({"query": specific_query})
step_back_query = step_back_response['text'].strip()
print(f"具体查询: '{specific_query}'")
print(f"LLM 生成的“步退”查询: '{step_back_query}'")# 第二步: 使用“步退”问题进行检索以获取背景知识
step_back_docs = retriever.invoke(step_back_query)
print("\n使用“步退”查询检索到的背景文档:")
for doc in step_back_docs:print(f"  - {doc.page_content}")# (可选) 第三步: 也可以同时用原问题检索，然后合并结果
# combined_docs = direct_docs + step_back_docs# 第四步: 将背景知识和原始问题结合起来，生成最终答案
# 这里我们创建一个新的提示模板，明确告诉LLM利用背景知识来回答具体问题
final_answer_template = """
### 指令 ###
请根据下面提供的“背景知识”，来回答“具体问题”。背景知识:
{context}具体问题: {question}
回答:
"""
final_answer_prompt = PromptTemplate(input_variables=["context", "question"], template=final_answer_template
)# 使用LCEL（LangChain Expression Language）将所有步骤串联起来
# 这里我们将背景文档格式化后传入
context_str = "\n".join([doc.page_content for doc in step_back_docs])final_rag_chain = LLMChain(llm=llm, prompt=final_answer_prompt)
final_response = final_rag_chain.invoke({"context": context_str,"question": specific_query
})print(f"\n最终回答: {final_response['text'].strip()}")
print("-" * 50)

“武器四”：假设性文档嵌入 (HyDE)

场景：用户的查询非常抽象或意图模糊，关键词匹配可能失效。例如“奥运会对城市的影响”，HyDE会先生成一个理想答案的样本，用这个样本的嵌入去检索，更能找到语义上真正相关的文档。

原理：HyDE 通过 LLM 生成一个包含潜在答案的假设性文档，将用户查询的语义转化为文档级的语义表示。具体来说，首先让 LLM 模拟 “如果文档包含该问题的答案，应该包含哪些内容”，生成虚拟文档；然后将该虚拟文档嵌入为向量，用于检索知识库中语义最接近的真实文档。这种方法解决了原始查询语义过窄的问题，使检索更贴近 LLM 的答案生成逻辑。

完整可执行代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate# --- 1. 通用设置 ---
os.environ["OPENAI_API_KEY"] = "sk-..." #  OpenAI API Key
documents_text = ["The 2024 Summer Olympics were held in Paris, France, boosting tourism and local economy.","Hosting the Olympics often requires massive investment in infrastructure, which can benefit the city long-term.","The budget for the Paris 2024 Olympics focused on using existing venues to control costs and avoid the 'host city curse' of debt.","Sustainability was a core pillar of the Paris 2024 Games, aiming to set a new model for future games.","A major security challenge for Paris was securing the city against various threats during the event.","The legacy of the Olympics can be complex, including both economic benefits and potential social displacement."
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
llm = OpenAI(temperature=0)# --- 2. 定义 HyDE 链，用于生成假设性文档 ---
hyde_template = """
### 指令 ###
你是一个知识渊博的助手。请根据下面的用户问题，撰写一小段假设性的、包含理想答案的文档。
这段文档将用于向量检索，所以它应该包含可能出现在真实好答案中的关键词和概念。用户问题: {query}
假设性文档:"""
hyde_prompt = PromptTemplate(input_variables=["query"], template=hyde_template)
hyde_chain = LLMChain(llm=llm, prompt=hyde_prompt)# --- 3. 定义最终的 RAG 链 ---
qa_chain = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=vectorstore.as_retriever(),return_source_documents=True
)# --- 4. 执行并对比 ---
abstract_query = "What is the impact of the Olympics on a host city?"# 4.1 直接使用抽象查询
print("--- [方法一] 使用抽象查询直接检索 ---")
original_docs = vectorstore.similarity_search(abstract_query)
print(f"抽象查询: '{abstract_query}'")
print("直接检索到的文档:")
for doc in original_docs:print(f"  - {doc.page_content[:100]}...")
print("-" * 50)# 4.2 使用 HyDE 策略
print("\n--- [方法二] 使用 'HyDE' 策略 ---")
# 第一步: 生成假设性文档
hypothetical_document = hyde_chain.invoke({"query": abstract_query})['text'].strip()
print(f"抽象查询: '{abstract_query}'")
print(f"LLM 生成的假设性文档:\n'{hypothetical_document}'")# 第二步: 使用假设性文档的嵌入进行检索
hyde_docs = vectorstore.similarity_search(hypothetical_document)
print("\n使用假设性文档检索到的文档:")
for doc in hyde_docs:print(f"  - {doc.page_content[:100]}...")# 第三步: 使用原始查询和HyDE检索到的文档生成最终答案
# 我们将HyDE检索到的文档手动传入RAG链的上下文中
context_str = "\n".join([doc.page_content for doc in hyde_docs])
final_prompt = f"Context: {context_str}\n\nQuestion: {abstract_query}\n\nAnswer:"
final_response = llm.invoke(final_prompt)print(f"\n最终回答: {final_response.strip()}")
print("-" * 50)

“武器五”：子问题查询 (Sub-Question Query)

场景：当一个查询涉及多个独立但相关的信息点，并且这些信息点可能分散在不同的文档中时，例如：“请介绍一下巴黎奥运会的主要场馆和交通方案。”。

原理：子问题查询是一种将复杂多信息点查询分解为一系列更小、更具体的子问题的方法。LLM首先解析原始查询，识别出其中包含的独立信息单元，并为每个信息单元生成一个独立的子查询。然后，每个子查询独立地进行检索，从而确保涵盖所有相关方面。最后，将所有子查询的检索结果汇集起来，为LLM提供一个全面的上下文，以便生成一个完整且连贯的答案。这种方法特别适用于需要从多个维度或多个事实点聚合信息的情况。

实战代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from operator import itemgetter# --- 1. 通用设置 ---
os.environ["OPENAI_API_KEY"] = "sk-..." # 替换OpenAI API Key
documents_text = ["The 2024 Summer Olympics were held in Paris, France.","The main opening ceremony of the Paris 2024 Olympics was unique, taking place along the River Seine.","Key venues for the Paris 2024 Olympics include the Stade de France for athletics and opening/closing ceremonies, and Roland Garros for tennis.","Many events were held at iconic Parisian landmarks like the Eiffel Tower (beach volleyball) and Grand Palais (fencing, taekwondo).","Public transport in Paris, including the extensive Metro and RER networks, was significantly boosted for the Olympics to handle increased passenger numbers.","Dedicated Olympic lanes were implemented on major roads to ensure timely movement of athletes and officials."
]
text_splitter = RecursiveCharacterCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
retriever = vectorstore.as_retriever()
llm = OpenAI(temperature=0)# --- 2. 定义子问题生成链 ---
sub_question_template = """
### 指令 ###
你是一个分解问题的专家。请根据用户提出的复杂问题，将其分解为一系列更小、更具体、独立的子问题。每个子问题应该能够单独检索并回答。
请将每个子问题作为列表项返回，不要包含其他任何解释。原始问题: {query}
分解后的子问题:
"""
sub_question_prompt = PromptTemplate(input_variables=["query"], template=sub_question_template)
sub_question_chain = LLMChain(llm=llm, prompt=sub_question_prompt)# --- 3. 定义一个用于回答单个子问题的RAG链 ---
# 这个链将用于处理每个子查询
qa_prompt_template = """
### 指令 ###
请根据下面提供的“上下文”，来回答“问题”。
如果上下文不足以回答问题，请说明你无法回答。上下文:
{context}问题: {question}
回答:
"""
qa_prompt = PromptTemplate(input_variables=["context", "question"], template=qa_prompt_template)
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)# --- 4. 组合所有步骤 ---
def get_and_answer_sub_questions(query):# 第一步: 生成子问题sub_questions_raw = sub_question_chain.invoke({"query": query})['text'].strip()sub_questions = [q.strip() for q in sub_questions_raw.split('\n') if q.strip()]print(f"原始查询: '{query}'")print(f"LLM 生成的子问题: {sub_questions}")all_retrieved_docs = []sub_answers = []# 第二步: 遍历每个子问题，进行检索和回答for sq in sub_questions:print(f"\n--- 处理子问题: '{sq}' ---")# 检索相关文档sq_docs = retriever.invoke(sq)all_retrieved_docs.extend(sq_docs) # 收集所有检索到的文档print(f"检索到 {len(sq_docs)} 份文档用于子问题 '{sq}'")for doc in sq_docs:print(f"  - {doc.page_content[:80]}...")# 生成子问题答案context_for_sq = "\n".join([doc.page_content for doc in sq_docs])sq_response = qa_chain.invoke({"context": context_for_sq, "question": sq})sub_answers.append(f"Q: {sq}\nA: {sq_response['text'].strip()}")print(f"子问题答案: {sq_response['text'].strip()}")# 第三步: 汇总所有子问题答案，作为最终答案的上下文final_context = "\n\n".join(sub_answers)# 第四步: 使用原始查询和汇总的上下文生成最终答案print("\n--- 生成最终答案 ---")final_answer_template = """### 指令 ###请根据下面提供的所有子问题及其答案，综合回答原始的复杂问题。请确保答案连贯、全面。汇总信息:{context}原始复杂问题: {original_query}最终回答:"""final_answer_prompt = PromptTemplate(input_variables=["context", "original_query"], template=final_answer_template)final_rag_chain = LLMChain(llm=llm, prompt=final_answer_prompt)final_response = final_rag_chain.invoke({"context": final_context,"original_query": query})return final_response['text'].strip(), all_retrieved_docs# --- 5. 执行查询 ---
complex_query_sub = "请介绍一下巴黎奥运会的主要场馆和交通方案。"
final_result, retrieved_documents = get_and_answer_sub_questions(complex_query_sub)print("\n--- 最终结果 ---")
print(f"原始复杂查询: '{complex_query_sub}'")
print(f"最终回答: {final_result}")
print(f"总共检索到的文档数量: {len(retrieved_documents)}")
print("-" * 50)

“武器六”：查询扩展与重排序 (Query Expansion and Re-ranking)

场景：当原始查询可能不足以捕捉所有相关信息，或者检索到的文档质量参差不齐时，需要对查询进行扩展，并对检索结果进行更精细的排序。例如：“告诉我关于巴黎奥运会的创新点。”

原理：查询扩展旨在通过增加同义词、相关词或派生词来丰富原始查询，从而扩大检索的覆盖面。这可以通过LLM生成、词典查找或领域知识图谱来实现。而重排序则是在初步检索到一批文档后，利用LLM或专门的排序模型，根据文档内容与用户查询的语义相关性进行二次评估和排序，将最相关的文档排在前面。这种方法综合了扩展检索范围和优化结果质量的优势，确保LLM获得最精准和最有用的上下文。

实战代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_core.documents import Document # 导入 Document 类型# --- 1. 通用设置 ---
os.environ["OPENAI_API_KEY"] = "sk-..." # 替换OpenAI API Key
documents_text = ["The 2024 Summer Olympics in Paris focused heavily on sustainability and innovation in event management.","A major innovation was the unique open-air opening ceremony along the River Seine, aiming to make the games more accessible.","Paris 2024 aimed to be the first 'climate-positive' Games, with initiatives like using 95% existing or temporary venues to reduce carbon footprint.","Technological innovations were deployed for security, including AI-powered surveillance systems.","The use of renewable energy sources for event power was another key innovative aspect of the Paris Games.","New urban sports like breaking were introduced, showcasing the innovative spirit of the Olympics."
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.create_documents(documents_text)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
llm = OpenAI(temperature=0)# --- 2. 定义查询扩展链 ---
query_expansion_template = """
### 指令 ###
你是一个查询优化专家。请根据用户原始查询，生成几个相关的、更全面的查询版本，以帮助检索系统获取更广泛的信息。
请将每个扩展查询作为列表项返回，不要包含其他任何解释。原始查询: {query}
扩展查询:
"""
query_expansion_prompt = PromptTemplate(input_variables=["query"], template=query_expansion_template)
query_expansion_chain = LLMChain(llm=llm, prompt=query_expansion_prompt)# --- 3. 定义文档重排序链 ---
# 这里我们模拟一个简单的LLM重排序，实际中可能更复杂，涉及交叉编码器等
rerank_template = """
### 指令 ###
请根据用户问题对以下文档进行相关性评分（1-10分），并按得分从高到低排序。
你只需要返回排序后的文档内容（不要评分，也不要其他解释）。用户问题: {query}待排序文档:
{documents}
"""
rerank_prompt = PromptTemplate(input_variables=["query", "documents"], template=rerank_template)
rerank_chain = LLMChain(llm=llm, prompt=rerank_prompt)# --- 4. 组合所有步骤 ---
def expanded_reranked_rag(query):print(f"--- 使用 '查询扩展与重排序' 策略 ---")print(f"原始查询: '{query}'")# 第一步: 查询扩展expanded_queries_raw = query_expansion_chain.invoke({"query": query})['text'].strip()expanded_queries = [q.strip() for q in expanded_queries_raw.split('\n') if q.strip()]print(f"LLM 生成的扩展查询: {expanded_queries}")all_retrieved_docs = set() # 使用set避免重复文档# 对每个扩展查询进行检索for eq in expanded_queries:docs = vectorstore.similarity_search(eq)for doc in docs:all_retrieved_docs.add(doc.page_content) # 只存储内容，方便后续处理retrieved_docs_list = [Document(page_content=content) for content in list(all_retrieved_docs)]print(f"扩展检索后（去重）共检索到 {len(retrieved_docs_list)} 份文档。")if not retrieved_docs_list:return "未能检索到相关文档。", []# 第二步: 文档重排序docs_for_rerank = "\n---\n".join([doc.page_content for doc in retrieved_docs_list])print(f"\n--- 进行文档重排序 ---")reranked_content = rerank_chain.invoke({"query": query, "documents": docs_for_rerank})['text'].strip()# 将重排序后的内容再次转换为 Document 对象列表 (这里需要LLM严格按照格式返回)# 实际应用中，重排序模型会直接返回排序后的文档对象# 简单起见，我们假设LLM返回的内容就是按序排列好的reranked_docs_content = [c.strip() for c in reranked_content.split('---') if c.strip()]reranked_final_docs = [Document(page_content=content) for content in reranked_docs_content if content]print(f"重排序后，将 {len(reranked_final_docs)} 份文档用于最终答案生成。")# 为了演示效果，只取重排序后的前N个文档final_context_docs = reranked_final_docs[:3] # 取前3个文档作为上下文# 第三步: 生成最终答案final_answer_template = """### 指令 ###请根据下面提供的“上下文”，来回答“问题”。如果上下文不足以回答问题，请说明你无法回答。上下文:{context}问题: {question}回答:"""final_answer_prompt = PromptTemplate(input_variables=["context", "question"], template=final_answer_template)context_str = "\n".join([doc.page_content for doc in final_context_docs])final_rag_chain = LLMChain(llm=llm, prompt=final_answer_prompt)final_response = final_rag_chain.invoke({"context": context_str,"question": query})return final_response['text'].strip(), retrieved_docs_list# --- 5. 执行查询 ---
innovative_query = "告诉我关于巴黎奥运会的创新点。"
final_answer, retrieved_docs_all = expanded_reranked_rag(innovative_query)print("\n--- 最终结果 ---")
print(f"原始查询: '{innovative_query}'")
print(f"最终回答: {final_answer}")
print("-" * 50)

“武器七”：意图识别与路由 (Intent Recognition and Routing)

场景：用户的查询可能涉及多个领域或需要不同类型的处理流程，而不是简单地检索。例如：“我想了解巴黎奥运会的票务信息，以及如何申请志愿者。”

原理：意图识别与路由策略首先利用LLM或专门的分类模型，识别用户查询的潜在意图（例如是查询票务、查询场馆、查询志愿者、还是需要进行对话）。基于识别出的意图，系统会将查询路由到不同的处理模块或RAG管道。例如，票务查询可能路由到专门的票务数据库，志愿者申请则可能路由到FAQ或申请流程文档。这种方法通过定制化的处理流程，显著提升复杂多意图查询的准确性和效率，避免“一刀切”的检索方式。

实战代码：

import os
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from operator import itemgetter# --- 1. 通用设置 ---
os.environ["OPENAI_API_KEY"] = "sk-..." # 替换OpenAI API Key
documents_text_olympics = ["The 2024 Summer Olympics were held in Paris, France.","Ticket sales for the Paris 2024 Olympics officially began in early 2023 through a lottery system and phased sales.","Information on ticket prices and availability can be found on the official Paris 2024 website.","Volunteer applications for the Paris 2024 Olympics opened in 2022, seeking over 45,000 volunteers.","Volunteers played a crucial role in various aspects, including welcoming visitors, assisting athletes, and managing venues.","The main opening ceremony of the Paris 2024 Olympics was unique, taking place along the River Seine."
]documents_text_general_info = ["Paris is the capital city of France, known for its art, fashion, gastronomy, and culture.","Popular tourist attractions in Paris include the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.","The climate in Paris is generally temperate, with warm summers and mild winters."
]text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)# 为奥运会信息创建向量数据库
texts_olympics = text_splitter.create_documents(documents_text_olympics)
vectorstore_olympics = FAISS.from_documents(texts_olympics, embeddings)
retriever_olympics = vectorstore_olympics.as_retriever()# 为通用信息创建向量数据库
texts_general_info = text_splitter.create_documents(documents_text_general_info)
vectorstore_general_info = FAISS.from_documents(texts_general_info, embeddings)
retriever_general_info = vectorstore_general_info.as_retriever()llm = OpenAI(temperature=0)
embeddings = OpenAIEmbeddings()# --- 2. 定义意图识别链 ---
intent_recognition_template = """
### 指令 ###
你是一个智能的意图识别系统。请根据用户查询，判断其主要意图。
可能的意图包括：
- 'olympics_tickets': 关于巴黎奥运会票务的信息。
- 'olympics_volunteer': 关于巴黎奥运会志愿者的信息。
- 'olympics_general': 关于巴黎奥运会的其他一般信息（例如开幕式、地点等）。
- 'general_paris_info': 关于巴黎市的通用信息（例如旅游、文化等）。
- 'unknown': 无法识别的意图。请只返回识别出的意图标签，不要包含任何其他解释。用户查询: {query}
意图:
"""
intent_recognition_prompt = PromptTemplate(input_variables=["query"], template=intent_recognition_template)
intent_recognition_chain = LLMChain(llm=llm, prompt=intent_recognition_prompt)# --- 3. 定义不同意图对应的RAG链 ---
qa_template = """
### 指令 ###
请根据下面提供的“上下文”，来回答“问题”。
如果上下文不足以回答问题，请说明你无法回答。上下文:
{context}问题: {question}
回答:
"""
qa_prompt = PromptTemplate(input_variables=["context", "question"], template=qa_template)# 奥运会票务/志愿者/一般信息共用一个 retriever
olympics_qa_chain = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=retriever_olympics,prompt=qa_prompt,return_source_documents=True
)# 巴黎通用信息使用另一个 retriever
general_paris_qa_chain = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=retriever_general_info,prompt=qa_prompt,return_source_documents=True
)# --- 4. 路由逻辑 ---
def route_query(query: str):intent = intent_recognition_chain.invoke({"query": query})['text'].strip()print(f"识别到的意图: {intent}")if intent in ['olympics_tickets', 'olympics_volunteer', 'olympics_general']:print("路由到奥运会信息处理链。")response = olympics_qa_chain.invoke({"query": query})elif intent == 'general_paris_info':print("路由到巴黎通用信息处理链。")response = general_paris_qa_chain.invoke({"query": query})else:print("意图未知或无法处理，提供通用回答。")response = {"result": "抱歉，我暂时无法理解查询意图，或者我没有足够的信息来回答这个问题。", "source_documents": []}return response# --- 5. 执行查询 ---
print("--- 使用 '意图识别与路由' 策略 ---")query1 = "我想了解巴黎奥运会的票务信息。"
response1 = route_query(query1)
print(f"查询: '{query1}'")
print(f"回答: {response1['result'].strip()}")
if response1['source_documents']:print("相关源文档:")for doc in response1['source_documents']:print(f"  - {doc.page_content[:80]}...")
print("-" * 50)query2 = "告诉我巴黎有哪些著名景点？"
response2 = route_query(query2)
print(f"查询: '{query2}'")
print(f"回答: {response2['result'].strip()}")
if response2['source_documents']:print("相关源文档:")for doc in response2['source_documents']:print(f"  - {doc.page_content[:80]}...")
print("-" * 50)query3 = "巴黎奥运会的志愿者要怎么申请？"
response3 = route_query(query3)
print(f"查询: '{query3}'")
print(f"回答: {response3['result'].strip()}")
if response3['source_documents']:print("相关源文档:")for doc in response3['source_documents']:print(f"  - {doc.page_content[:80]}...")
print("-" * 50)query4 = "如何制作一份提拉米苏？" # 假设这是无法识别的意图
response4 = route_query(query4)
print(f"查询: '{query4}'")
print(f"回答: {response4['result'].strip()}")
print("-" * 50)

高级优化技巧

除了上述“七种武器”之外，以下高级优化技巧也能显著提升RAG系统的表现：

1. 上下文感知（Context Awareness）：
- 会话历史整合： 在多轮对话中，将先前的对话轮次作为当前查询的上下文输入LLM，帮助其理解用户意图的演变。LangChain中的ConversationBufferMemory等模块可以很好地管理会话历史。
- 用户画像与偏好： 结合用户画像或历史交互数据，个性化RAG行为。例如，对于技术用户，可以偏向检索更专业的技术文档；对于普通用户，则偏向更易懂的科普内容。
- 外部工具调用： 当查询涉及最新信息、实时数据（如天气、股市）或需要执行特定操作（如订票）时，RAG系统可以识别这些意图并调用外部API或工具，而非仅依赖检索静态知识库。例如，LangChain的Agent能力可以实现这一点。
2. 混合检索（Hybrid Retrieval）：
- 向量检索 + 关键词检索： 将语义相似性检索（向量搜索）与精确关键词匹配检索相结合。向量检索擅长处理同义词和概念理解，而关键词检索在处理专有名词、代码片段或精确匹配需求时更为强大。取两者交集或并集，再进行融合排序，可以提高召回率和精确度。
- 多模态检索： 对于包含图片、表格、图表等多种模态信息的文档，利用多模态嵌入技术进行检索，使RAG系统能够理解并检索这些非文本信息。
3. 迭代式检索与反思（Iterative Retrieval & Self-Correction）：
- 多跳推理： 对于需要多步推理才能得出答案的复杂问题，RAG系统可以进行多次迭代检索。第一次检索获取初步信息，LLM基于此信息生成新的中间查询，再次检索以获取更深层次或补充信息，直到能够给出完整答案。
- 答案验证与校准： LLM在生成答案后，可以被引导“反思”其答案的质量。例如，通过生成验证性问题并再次检索，检查答案是否与知识库冲突，或者是否存在信息缺失，从而进行自我校准。
4. 知识图谱增强（Knowledge Graph Integration）：
- 将结构化知识图谱与非结构化文档相结合。当用户查询涉及实体关系时，可以首先在知识图谱中进行推理，然后利用推理结果或相关实体信息来增强对文档的检索。这对于需要复杂关系理解和推断的场景特别有效。
5. 离线优化与评估（Offline Optimization & Evaluation）：
- 数据清洗与预处理： 确保知识库中的文档质量高、无冗余、格式统一，这能直接影响检索效果。
- 嵌入模型选择与微调： 根据特定领域选择或微调更适合的嵌入模型，使其更好地捕捉领域特定的语义信息。
- RAG评估框架： 建立量化的评估指标（如检索的准确率、召回率，生成答案的相关性、忠实度、流畅度等），并使用工具如Ragas、TruLens等对RAG管道进行系统性评估和迭代优化。