当前位置：首页 > news >正文

让 Agent 说“机器能懂的话”——LlamaIndex 构建 Agent 的结构化输出策略

news 2025/10/2 7:06:05

在构建 Agent 时，单纯依赖自然语言输出非常脆弱 —— 下游系统、存储、可视化都难解析。LlamaIndex 支持把 Agent 的回答“收口”成结构化 JSON / Pydantic 模型。本文带你理解背后的原理、常见方案、实践示例与工程思考。

背景：为什么要结构化输出？

当你用 LLM／Agent 驱动系统做自动化操作（如合同分析、知识问答、文档处理、流程编排等），最终输出往往要被：

存入数据库（表、JSON、NoSQL），
被前端渲染（图表、卡片、报表），
被下一个模块消费（路由、决策引擎、链路调用）……

如果让 Agent 直接输出自然语言（“这个合同里有三条风险：……；建议是……”），后面就要做 NLP 提取、正则匹配、人工检查，脆弱度飙高。结构化输出，就是让 Agent “说机器能懂的话”：固定字段、固定格式、强校验。

LlamaIndex 在 “询问 / Agent / 多步工作流” 的构建中，自身就对结构化输出提供支持 —— 你不必手写一堆正则或 JSON 验错逻辑。

LlamaIndex 支持结构化输出的几个关键能力

在 LlamaIndex 的文档里，“Structured Outputs”是一个专门模块，它讲了整个管线如何前插/后插检查、注入提示、做解析重试。

下面是几个你常会用到的机制／工具：

机制 / 名称	作用	适用场景
Pydantic Program	把“输入 + 提示 → Pydantic 模型”抽象为一段可复用程序	当你要把多个不同输入都变成结构化对象
Output Parser	在 LLM 调用前后插入格式提示 / 解析逻辑	用普通文本生成但希望“包一层 JSON”
LLM 的结构化模式 / 函数调用机制	如果模型支持直接函数调用 / JSON 输出，就跳过手动解析	最简洁、错误率最低的方法
Workflows / AgentWorkflow 中的事件机制	多步流程中，各节点可以传递 Pydantic 事件对象	中间节点输出结构化事件，后续节点直接读取字段

此外，在 LlamaIndex 的 Workflow 示例里，有一个 “Reflection Workflow for Structured Outputs” 的 notebook，是用来做「如果输出不合法就重试／反思」的常用套路。

在 Agent 中做结构化输出：常见方案

在 Agent 场景里，你通常会定义一个 Agent（FunctionAgent / ReActAgent / AgentWorkflow 等），然后让它做“接收用户 → 判断 / 检索 / 决策 → 输出”。在这个末尾输出环节，就要设计结构化输出。常见的几条路径如下。

方案一：output_cls（用 Pydantic 模型约束输出）

这是最直接、最推荐的方式：在 Agent 的构造里，给它一个 output_cls 参数（一个 Pydantic 模型）。Agent 会：

在 prompt 外加格式说明（expect JSON 字段），
接收 LLM 原始输出，交给内部输出解析器做 parse → output_cls，
如果 parse 失败、格式不符，可触发重试机制或报错。

优点是你在业务层拿到的就是一个类型化对象，不用再做额外解析。缺点是当输出逻辑复杂、跨工具融合、聚合多个子结果时，这个模型可能不够灵活。

方案二：structured_output_fn（聚合拼装式）

如果你的 Agent 做多步调用（比如 ReAct 多次检索＋合成），你可能想让中间步骤随意，最后一步统一 “把这些碎片拼成一个结构化对象”。这时可以给 Agent 提供 structured_output_fn，它接收 Agent 的上下文 / 中间输出 / 工具返回值，自己写代码把它 “组装进你定义的结构体 / JSON” 返回。

这条路径给你最大的灵活性 — 它让你把「工具调用、证据融合、排序、打分」的逻辑完全掌控。

方案三：在 AgentWorkflow / 工作流中，用事件（Event）传结构化数据

如果你的 Agent 驱动在一个更大的 Workflow / AgentWorkflow 框架里，那么中间各步可以通过 事件（Event）对象 来交换结构化数据（这些事件通常是基于 Pydantic 的模型）。这样，Agent 的各个子节点、工具调用、决策节点之间，就可以用强类型、字段可读的方式传递数据，而不是丢一个字符串给下一个节点去 parse。

这种方式语义清晰、易于调试、可观测性好，适合复杂流程编排场景。

实战演示（代码 + 说明）

示例1：使用 `output_cls`

下面给一个简化的代码示例，演示 Agent 用 Pydantic 进行结构化输出。

# ================== 初始化 大模型 ==================
from common.muxue_model_loader import model_loaders# ================== 初始化大模型 end ==================
import asyncio  
from llama_index.core.agent.workflow import FunctionAgent, AgentWorkflow
from llama_index.llms.openai import OpenAI
from pydantic import BaseModel, Field## define structured output format  and tools
class MathResult(BaseModel):operation: str = Field(description="执行的操作")result: int = Field(description="操作的结果")def multiply(x: int, y: int):"""将两个数字相乘"""return x * y## define agent
agent = FunctionAgent(tools=[multiply],name="calculator",system_prompt="你是一个计算器代理，你可以使用`multiply`工具来相乘两个数字。",output_cls=MathResult,# llm=llm,
)
async def test():response = await agent.run("将3415和43144相乘")print("-" * 20)print(response)print("----结构化输出--" )print(response.structured_response)print("----结构化输出2--" )print(response.get_pydantic_model(MathResult))asyncio.run(test())

结果：

--------------------
3415和43144的乘积是147336760。
----结构化输出--
{'operation': 'multiply', 'result': 147336760}
----结构化输出2--
operation='multiply' result=147336760

示例2：多智能体工作流程

## define structured output format  and tools
class Weather(BaseModel):location: str = Field(description="The location")weather: str = Field(description="The weather")def get_weather(location: str):"""Get the weather for a given location"""return f"The weather in {location} is sunny"## define single agents
agent = FunctionAgent(llm=llm,tools=[get_weather],system_prompt="You are a weather agent that can get the weather for a given location",name="WeatherAgent",description="The weather forecaster agent.",
)
main_agent = FunctionAgent(name="MainAgent",tools=[],description="The main agent",system_prompt="You are the main agent, your task is to dispatch tasks to secondary agents, specifically to WeatherAgent",can_handoff_to=["WeatherAgent"],llm=llm,
)## define multi-agent workflow
workflow = AgentWorkflow(agents=[main_agent, agent],root_agent=main_agent.name,output_cls=Weather,
)response = await workflow.run("What is the weather in Tokyo?")
print(response.structured_response)
print(response.get_pydantic_model(Weather))

注意这种方式的Agentworkflow的多个节点，只能在最后一个Agent中实现结构化，其中间Agent节点想要做结构化，则只能是Workflow 的事件里传递对象。

示例3：使用 `structured_output_fn`

自定义函数应将智能体工作流生成的 ChatMessage 对象序列作为输入，并返回一个字典（可以转换为 BaseModel 子类）：

import json
from llama_index.core.llms import ChatMessage
from typing import List, Dict, Anyclass Flavor(BaseModel):flavor: strwith_sugar: boolasync def structured_output_parsing(messages: List[ChatMessage],
) -> Dict[str, Any]:sllm = llm.as_structured_llm(Flavor)messages.append(ChatMessage(role="user",content="Given the previous message history, structure the output based on the provided format.",))response = await sllm.achat(messages)return json.loads(response.message.content)def get_flavor(ice_cream_shop: str):return "Strawberry with no extra sugar"agent = FunctionAgent(tools=[get_flavor],name="ice_cream_shopper",system_prompt="You are an agent that knows the ice cream flavors in various shops.",structured_output_fn=structured_output_parsing,llm=llm,
)response = await agent.run("What strawberry flavor is available at Gelato Italia?"
)
print(response.structured_response)
print(response.get_pydantic_model(Flavor))

示例4：流式传输结构化输出

您可以使用 AgentStreamStructuredOutput 事件在工作流运行时获取结构化输出：

from llama_index.core.agent.workflow import (AgentInput,AgentOutput,ToolCall,ToolCallResult,AgentStreamStructuredOutput,
)handler = agent.run("What strawberry flavor is available at Gelato Italia?")async for event in handler.stream_events():if isinstance(event, AgentInput):print(event)elif isinstance(event, AgentStreamStructuredOutput):print(event.output)print(event.get_pydantic_model(Weather))elif isinstance(event, ToolCallResult):print(event)elif isinstance(event, ToolCall):print(event)elif isinstance(event, AgentOutput):print(event)else:passresponse = await handler

工程视角：结构化输出要注意的坑与技巧

为了让你的 Agent 在真实环境中稳定输出结构化数据，下列经验值得牢记。

schema 先设计，后填充：先在业务/前端层定义好字段、必需项、类型约束，再在 Agent 那边把这个模型作为 output_cls 或拼装模板。
Prompt + Format 指令要严谨：让 LLM 在 prompt 里明确“必须严格返回纯 JSON”或“不得多写说明文字”。
输出解析要有重试：即使模型支持结构化输出，也可能出错。引入 1–2 次重试、反思机制是常见做法（见 “Reflection Workflow” 示例）
中间节点输出也要结构化：不要在中间节点乱返回字符串，在 Workflow / AgentWorkflow 中用事件 /中转模型，让每一步都有明确输入/输出。
监控 & 日志：记录原始文本输出 + 解析失败原因 + 最终模型对象，这样方便排错。
模型能力与约束结合：如果使用 LLM 本身支持 JSON 模式 / 函数调用模式，那就优先用它；对于不可靠模型，额外加入强提示 / 验证层。
版本演化兼容：未来可能要扩字段或调整输出，给 schema 保留可扩展性（可选字段 / 版本号）。
测试覆盖：对典型输入、极端输入都写单测，验证模型在多种情形下能否成功 parse。

让 Agent 说“机器能懂的话”——LlamaIndex 构建 Agent 的结构化输出策略

背景：为什么要结构化输出？

LlamaIndex 支持结构化输出的几个关键能力

在 Agent 中做结构化输出：常见方案

方案一：output_cls（用 Pydantic 模型约束输出）

方案二：structured_output_fn（聚合拼装式）

方案三：在 AgentWorkflow / 工作流中，用事件（Event）传结构化数据

实战演示（代码 + 说明）

示例1：使用 `output_cls`

示例2：多智能体工作流程

示例3：使用 `structured_output_fn`

示例4：流式传输结构化输出

工程视角：结构化输出要注意的坑与技巧

相关网址

相关文章：

背景：为什么要结构化输出？

LlamaIndex 支持结构化输出的几个关键能力

在 Agent 中做结构化输出：常见方案

方案一：output_cls（用 Pydantic 模型约束输出）

方案二：structured_output_fn（聚合拼装式）

方案三：在 AgentWorkflow / 工作流中，用事件（Event）传结构化数据

实战演示（代码 + 说明）

示例1：使用 output_cls

示例2：多智能体工作流程

示例3：使用 structured_output_fn

示例4：流式传输结构化输出

工程视角：结构化输出要注意的坑与技巧

相关网址

相关文章：

示例1：使用 `output_cls`

示例3：使用 `structured_output_fn`