当前位置：首页 > news >正文

AI 智能体框架：LlamaIndex

news 2025/10/7 14:42:30

LlamaIndex 中的组件是什么？
- 使用组件构建 RAG 流程
- - 第一步，数据加载与嵌入
  - 第二步，存储与索引文档
  - 第三步，使用提示词和 LLM 查询 VectorStoreIndex
在 LlamaIndex 中使用工具
- 创建 FunctionTool 工具
- 创建 QueryEngineTool 工具
- 创建 Toolspecs 工具
在 LlamaIndex 中使用智能体
- 函数调用智能体
- 使用 `QueryEngineTools` 创建 RAG 智能体
- - 创建多智能体系统
在 LlamaIndex 中创建智能工作流

LlamaIndex
HF Agent Course

虽然 LlamaIndex 与其他框架（如 smolagents）有相似之处，但具备以下关键优势：

清晰的工作流系统。通过事件驱动和异步优先的语法，工作流帮助逐步分解智能体的决策过程，实现逻辑的清晰组合与组织。
丰富的即用组件。凭借长期的技术积累，LlamaIndex 与众多框架兼容，提供大量经过验证的可靠组件（如 LLM、检索器、索引等）。
基于 LlamaParse 的高级文档解析。专为 LlamaIndex 打造的文档解析工具，尽管是付费功能，但提供无缝集成体验。

LlamaIndex 中的组件是什么？

详见 LlamaIndex 框架所有核心模块和组件的详细文档

重点关注 QueryEngine 组件。

为什么？因为它可以作为智能体的检索增强生成（RAG）工具。任何智能体都需要 理解和查找相关数据的能力，QueryEngine 正好提供了这种核心能力。

使用组件构建 RAG 流程

RAG 中包含五个关键阶段，
在这里插入图片描述

第一步，数据加载与嵌入

首先要加载数据，而原始数据可能是文本文件、PDF、网站、数据库或 API 等格式。

在这里插入图片描述

LlamaIndex 提供三种主要的数据加载方式，

SimpleDirectoryReader：内置加载器，支持从本地目录加载多种文件类型；
LlamaParse：LlamaIndex 官方 PDF 解析工具，提供托管 API 服务；
LlamaHub：包含数百个数据加载库的注册中心，支持从任意数据源获取数据。

最简单的数据加载方式是使用 SimpleDirectoryReader。这个多功能组件可以从文件夹中加载各类文件，并将其转换为 LlamaIndex 可处理的 Document 对象。以下是具体使用方法：

from llama_index.core import SimpleDirectoryReaderreader = SimpleDirectoryReader(input_dir="path/to/directory")
documents = reader.load_data()

加载文档后，需要将其分解为更小的单元——Node 对象。 Node 是原始文档中的文本片段，既便于 AI 处理，又保留了对原 Document 对象的引用。

IngestionPipeline 通过两个关键转换步骤帮助从创建这些节点：

SentenceSplitter 通过自然语句边界将文档拆分为可管理的文本块；
HuggingFaceInferenceAPIEmbedding 将每个文本块转换为数值化的向量表示——这种以 AI 能高效处理的方式捕捉语义信息。

from llama_index.core import Document
from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline# 通过转换创建管道
pipeline = IngestionPipeline(transformations=[SentenceSplitter(chunk_overlap=0),HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),]
)nodes = await pipeline.arun(documents=[Document.example()])

来直观感受一下 Node 长什么样，见 components.ipynb，这个 Notebook 生成 Node 的步骤为：

从数据集加载并写入文件：HuggingFace 的 load_dataset，把每条记录的 persona 文本写成单独的 .txt 文件，存到 data/ 目录下。
从文件读取，生成 Document：SimpleDirectoryReader 会遍历 data/ 里的文本文件，把每个文件封装成一个 Document 对象列表返回。
把 Document 转成更细粒度的 Node 并做 Embedding：SentenceSplitter() 会把每个 Document 按句子拆成多个小段，每段对应一个 Node；HuggingFaceEmbedding(...) 则对这些 Node 执行向量化，最终 nodes 就是一系列带有 embedding 向量的 Node 对象。

第二步，存储与索引文档

创建完节点对象后，我们需要对其进行索引，使其可以被搜索，但在此之前，需要一个存储数据的地方。

可以直接在管道上附加一个向量存储来填充数据，详见 LlamaIndex 提供的向量存储。在本例中，使用 Chroma 来存储文档。

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStoredb = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)pipeline = IngestionPipeline(transformations=[SentenceSplitter(chunk_size=25, chunk_overlap=0),HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),],vector_store=vector_store,
)

这就是 向量嵌入的作用所在 — 通过 将查询和节点嵌入同一向量空间，可以找到相关的匹配项。

在摄取（ingestion）时对文本做嵌入，用的是同一个模型；后续当对查询做嵌入，也必须用同一个模型，才能保证查询向量和存储向量在同一空间里进行距离／相似度比较。

要点在于：向量空间的统一性。具体来说：同一模型 ⇒ 同一映射函数。嵌入模型可以看作一个函数
$\text{文本} \;\longrightarrow\; \mathbb{R}^d$

如果摄取时用的是模型 $f$ ，把所有文档切片都映射成了一堆向量 ${f(doci)}\{f(\text{doc}_i)\}$ 。
若查询侧用了不同模型 $g≠fg\neq f$ ，则查询向量 $g (q)$ 和存储向量 $f(doci)f(\text{doc}_i)$ 所处空间是不同的，彼此的坐标系不匹配，余弦相似度就失去了意义。

只有当 查询向量和文档向量都在同一函数 $f$ 定义的空间里，才可用 $sim(f(q),f(doci))\mathrm{sim}(f(q),\,f(\text{doc}_i))$ 来衡量它们是不是谈论同一个话题、语义有多接近。

如何从向量存储和嵌入中创建该索引：

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding# 嵌入模型：摄取和查询都用它
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# vector_store 已经包含了所有文档切片的向量和元数据
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, embed_model=embed_model  # 事先用同样的 BAAI/bge-small-en-v1.5 嵌入模型
)

第三步，使用提示词和 LLM 查询 VectorStoreIndex

现在可以轻松保存和加载索引了，在查询索引之前，需要将其转换为查询接口。最常见的转换选项包括：

as_retriever：用于基础文档检索，返回带有相似度得分的 NodeWithScore 对象列表；
as_query_engine：用于单次问答交互，返回书面响应；
as_chat_engine：用于需要保持跨消息记忆的对话交互，通过聊天历史记录和索引上下文返回书面响应。

from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncionest_asyncio.apply()  # This is needed to run the query engine
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(llm=llm,response_mode="tree_summarize",
)
response = query_engine.query("Respond using a persona that describes author and travel experiences?"
)

response 结果详见 components.ipynb。

在 LlamaIndex 中使用工具

清晰的工具接口更便于 LLM 使用，就像人类工程师使用的软件 API 接口一样，如果工具的工作原理容易理解，LLM 就能更好地利用它。

LlamaIndex 中主要包含四种工具类型：

FunctionTool：将任意 Python 函数转换为智能体可以使用的工具。它能自动识别函数的工作原理。
QueryEngineTool：让智能体能够使用查询引擎的工具。由于智能体本身基于查询引擎构建，因此它们也可以将其他智能体作为工具使用。
Toolspecs：由社区创建的预设工具集，通常包含针对特定服务（如 Gmail）的工具。
Utility Tools：帮助处理来自其他工具的大量数据的特殊工具。

创建工具时必须包含哪些信息？

仅需函数 Correct! 名称和描述默认取自所提供函数的名称和文档字符串。

创建 FunctionTool 工具

可以传递同步或异步函数给 FunctionTool 工具，并可选地指定 name 和 description 参数。

from llama_index.core.tools import FunctionTooldef get_weather(location: str) -> str:"""Useful for getting the weather for a given location."""print(f"Getting weather for {location}")return f"The weather in {location} is sunny"tool = FunctionTool.from_defaults(get_weather,name="my_weather_tool",description="Useful for getting the weather for a given location.",
)
tool.call("New York")

创建 QueryEngineTool 工具

使用组件构建 RAG 流程 中定义的 QueryEngine 可以使用 QueryEngineTool 类轻松转换为工具。

from llama_index.core import VectorStoreIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStoreembed_model = HuggingFaceInferenceAPIEmbedding("BAAI/bge-small-en-v1.5")db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(llm=llm)
tool = QueryEngineTool.from_defaults(query_engine, name="some useful name", description="some useful description")

创建 Toolspecs 工具

将 ToolSpecs 视为可以和谐协作的工具集合，

from llama_index.tools.google import GmailToolSpectool_spec = GmailToolSpec()
tool_spec_list = tool_spec.to_tool_list()

为了更详细地了解这些工具，可以查看每个工具的元数据。

[(tool.metadata.name, tool.metadata.description) for tool in tool_spec_list]

结果详见 tools.ipynb

在 LlamaIndex 中使用智能体

LlamaIndex 支持三种主要类型的推理智能体：

函数调用智能体 - 适用于支持调用特定函数的 AI 模型；
ReAct 智能体 - 适用于具有聊天或文本完成能力的 AI 模型，擅长处理复杂推理任务；
高级自定义智能体 - 使用更复杂的方法处理高阶任务和工作流。

创建智能体时，首先需要为其 提供定义其能力的功能/工具集合。

函数调用智能体

from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.core.agent.workflow import AgentWorkflow, ToolCallResult, AgentStreamdef add(a: int, b: int) -> int:"""Add two numbers"""return a + bdef subtract(a: int, b: int) -> int:"""Subtract two numbers"""return a - bdef multiply(a: int, b: int) -> int:"""Multiply two numbers"""return a * bdef divide(a: int, b: int) -> int:"""Divide two numbers"""return a / bllm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")agent = AgentWorkflow.from_tools_or_functions(tools_or_functions=[subtract, multiply, divide, add],llm=llm,system_prompt="You are a math agent that can add, subtract, multiply, and divide numbers using provided tools.",
)

智能体默认是无状态的，如需记忆过往交互，需显式使用 Context 对象。

# stateless
response = await agent.run("What is 2 times 2?")# remembering state
from llama_index.core.workflow import Contextctx = Context(agent)response = await agent.run("My name is Bob.", ctx=ctx)
response = await agent.run("What was my name again?", ctx=ctx)

注意 LlamaIndex 中的智能体采用异步模式（使用 Python 的 await 操作符）。

使用 `QueryEngineTools` 创建 RAG 智能体

智能体增强检索（Agentic RAG）是通过智能体实现数据问答的强大范式。可以为 Alfred 配备多种工具来辅助问题解答。

但不同于传统 RAG 直接基于文档回答问题，Alfred 能够 自主决定是否使用其他工具或流程来响应查询。

from llama_index.core.tools import QueryEngineToolquery_engine = index.as_query_engine(llm=llm, similarity_top_k=3) # as shown in the previous sectionquery_engine_tool = QueryEngineTool.from_defaults(query_engine=query_engine,name="name",description="a specific description",return_direct=False,
)
query_engine_agent = AgentWorkflow.from_tools_or_functions([query_engine_tool],llm=llm,system_prompt="You are a helpful assistant that has access to a database containing persona descriptions. "
)

创建多智能体系统

AgentWorkflow 类原生支持多智能体系统。通过为每个智能体分配名称和描述，系统可维护单一活跃会话主体，同时允许智能体之间进行任务交接。

LlamaIndex 中的智能体也可直接作为其他智能体的工具。

from llama_index.core.agent.workflow import (AgentWorkflow,FunctionAgent,ReActAgent,
)# Define some tools
def add(a: int, b: int) -> int:"""Add two numbers."""return a + bdef subtract(a: int, b: int) -> int:"""Subtract two numbers."""return a - b# 创建智能体配置
# 注意：我们可以在此使用 FunctionAgent 或 ReActAgent
# FunctionAgent 适用于具有函数调用 API 的 LLM
# ReActAgent 适用于任何 LLM
calculator_agent = ReActAgent(name="calculator",description="Performs basic arithmetic operations",system_prompt="You are a calculator assistant. Use your tools for any math operation.",tools=[add, subtract],llm=llm,
)query_agent = ReActAgent(name="info_lookup",description="Looks up information about XYZ",system_prompt="Use your tool to query a RAG system to answer information about XYZ",tools=[query_engine_tool],llm=llm
)# 创建并运行工作流程
agent = AgentWorkflow(agents=[calculator_agent, query_agent], root_agent="calculator"
)# 运行系统
response = await agent.run(user_msg="Can you add 5 and 3?")

在 LlamaIndex 中创建智能工作流

这种工作流通过定义由事件（Events）触发的步骤（Steps）来创建，这些步骤本身也会发出事件来触发后续步骤。详见 Workflow

在这里插入图片描述

工作流在保持对整体流程控制的同时，实现了智能体的自主性之间的完美平衡。

通过定义一个继承自 Workflow 的类并用 @step 装饰你的函数来创建一个单步工作流。我们还需要添加 StartEvent 和 StopEvent，它们是用于指示工作流开始和结束的特殊事件。

from llama_index.core.workflow import StartEvent, StopEvent, Workflow, stepclass MyWorkflow(Workflow):@stepasync def my_step(self, ev: StartEvent) -> StopEvent:# do something herereturn StopEvent(result="Hello, world!")w = MyWorkflow(timeout=10, verbose=False)
result = await w.run()

在这里插入图片描述

为了 连接多个步骤，创建 在步骤之间传输数据的自定义事件。为此，需要添加一个在步骤之间传递的事件，并将第一步的输出传输到第二步。

from llama_index.core.workflow import Eventclass ProcessingEvent(Event):intermediate_result: strclass MultiStepWorkflow(Workflow):@stepasync def step_one(self, ev: StartEvent) -> ProcessingEvent:# Process initial datareturn ProcessingEvent(intermediate_result="Step 1 complete")@stepasync def step_two(self, ev: ProcessingEvent) -> StopEvent:# Use the intermediate resultfinal_result = f"Finished processing: {ev.intermediate_result}"return StopEvent(result=final_result)w = MultiStepWorkflow(timeout=10, verbose=False)
result = await w.run()
result

类型提示 在这里很重要，因为它可以 确保工作流正确执行。类型提示 让框架知道「谁能处理哪种事件」「下一步该往哪儿走」。

step_one 明确接收一个 StartEvent、并返回一个 ProcessingEvent；
step_two 明确接收一个 ProcessingEvent、并返回一个 StopEvent。

如果不小心写成 async def step_two(self, ev: StartEvent)，类型就对不上了，框架会在启动或运行时抛错，提示“不存在能接收 ProcessingEvent 的 handler”，避免无效或混乱的调用链。

如果只是从头到尾运行工作流，那么这个工作流仍然没什么意义！做一些分支和循环吧。

from llama_index.core.workflow import Event
import randomclass ProcessingEvent(Event):intermediate_result: strclass LoopEvent(Event):loop_output: strclass MultiStepWorkflow(Workflow):@stepasync def step_one(self, ev: StartEvent | LoopEvent) -> ProcessingEvent | LoopEvent:if random.randint(0, 1) == 0:print("Bad thing happened")return LoopEvent(loop_output="Back to step one.")else:print("Good thing happened")return ProcessingEvent(intermediate_result="First step complete.")@stepasync def step_two(self, ev: ProcessingEvent) -> StopEvent:# Use the intermediate resultfinal_result = f"Finished processing: {ev.intermediate_result}"return StopEvent(result=final_result)w = MultiStepWorkflow(verbose=False)
result = await w.run()
result