当前位置：首页 > news >正文

【LangChain】理论及应用实战（5）：Agent

news 2025/7/19 7:47:56

文章目录

一、基本介绍
- 1.1 Agent介绍
- 1.2 Agent示例
二、几种主要的Agent类型
- 2.1 ZERO_SHOT_REACT_DESCRIPTION
- 2.2 CHAT_ZERO_SHOT_REACT_DESCRIPTION
- 2.3 CONVERSATIONAL_REACT_DESCRIPTION
- 2.4 CHAT_CONVERSATIONAL_REACT_DESCRIPTION
- 2.5 OPENAI_FUNCTIONS
三、给Agent增加Memory
四、在Agent与Tool之间共享记忆
参考资料

本文主要内容参考资料：AI Agent智能体开发，一步步教你搭建agent开发环境（需求分析、技术选型、技术分解）

一、基本介绍

大模型跟人脑一样存储了大量的知识，我们不仅希望用这些知识来做一些简单的问答，我们更希望它也可以像人一样做一些自主决策，这就意味着要求它能够在没有人参与的情况下独立完成一些具有一定复杂度的任务。这个完成任务的过程就包括将任务切分成一些具体的小任务，以及每一步完成后接下来要做什么等这样的推理过程。langchain中的agent就是基于这种目标的一项功能。

在这里插入图片描述

1.1 Agent介绍

在这里插入图片描述

Agent的能力主要分为以下四部分：

Memory（记忆）：智能体用来存储和检索历史信息的组件。它允许智能体在多次交互中保持上下文，从而做出更连贯和相关的响应。记忆可以分为短期记忆和长期记忆：
- 短期记忆：通常用于存储当前会话中的信息，如最近的对话历史。
- 长期记忆：用于存储更持久的信息，如用户偏好或历史数据。
Plan（计划）: 智能体用来决定如何执行任务的策略或步骤。它涉及对当前状态和目标的分析，以生成一系列行动步骤。计划可以是静态的（预定义的）或动态的（根据当前情况生成）。
- 静态计划：预先定义好的步骤，适用于结构化的任务。
- 动态计划：根据当前上下文和目标实时生成的步骤，适用于复杂和动态的任务。
Action（动作）: 智能体执行的具体操作。每个行动都是实现计划中的一个步骤。行动可以是调用一个工具、生成一段文本或执行一个外部 API 调用。
- 工具调用：智能体可以调用各种工具来执行特定任务，如搜索、计算或数据检索。
- 文本生成：智能体可以生成自然语言响应，与用户进行交互。
Tools（工具）: 智能体用来执行特定任务的函数或 API。工具可以包括搜索引擎、数据库查询、计算器、翻译服务等。智能体通过调用这些工具来获取信息或执行操作。
- 内置工具：LangChain 提供了一些内置工具，如搜索工具、计算工具等。
- 自定义工具：开发者可以根据需要创建自定义工具，并将其集成到智能体中。

主要流程：

（用户）提出需求/问题
问题+Prompt组合
ReAct Loop
查找Memory
查找可用工具
执行工具并观察结果
重复步骤2-7只到得到最终结果

1.2 Agent示例

一个最简单的Agent的示例，具备两个功能：

会做数学题
不知道答案的时候可以搜索

实现代码如下：

import os
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent, AgentType

# 定义LLM
1lm = OpenAI(
    temperature=0,
    model="gpt-3.5-turbo-instruct",
)

# 搭建工具：serpai搜索引擎可以实现在线搜索
# pip install google-search-results

os.environ["SERPAPI_API_KEY"] = "xxxxxxxxxxxxx"

tools = load_tools(["serpapi", "llm-math"], llm=llm)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,  # agent的类型
    verbose=True
)
agent.run("请问上一任的美国总统是谁？他的年龄除以2的整数是多少？")

二、几种主要的Agent类型

langchain中内置的几种主要的Agent类型包括：

OPENAI_FUNCTIONS：OPENAI函数调用型，遵循OPENAI风格的用法
ZERO_SHOT_REACT_DESCRIPTION：零样本增强生成型（LLM）
CHAT_ZERO_SHOT_REACT_DESCRIPTION：零样本增强生成型（Chat Model）
CONVERSATIONAL_REACT_DESCRIPTION：对话增强生成型

我们可以看下官方关于 AgentType 的源代码：

class AgentType(str, Enum):
    """An enum for agent types.

    See documentation: https://python.langchain.com/docs/modules/agents/agent_types/
    """

    ZERO_SHOT_REACT_DESCRIPTION = "zero-shot-react-description"
    """A zero shot agent that does a reasoning step before acting."""

    REACT_DOCSTORE = "react-docstore"
    """A zero shot agent that does a reasoning step before acting.
    
    This agent has access to a document store that allows it to look up 
    relevant information to answering the question.
    """

    SELF_ASK_WITH_SEARCH = "self-ask-with-search"
    """An agent that breaks down a complex question into a series of simpler questions.
    
    This agent uses a search tool to look up answers to the simpler questions
    in order to answer the original complex question.
    """
    CONVERSATIONAL_REACT_DESCRIPTION = "conversational-react-description"
    CHAT_ZERO_SHOT_REACT_DESCRIPTION = "chat-zero-shot-react-description"
    """A zero shot agent that does a reasoning step before acting.
    
    This agent is designed to be used in conjunction 
    """

    CHAT_CONVERSATIONAL_REACT_DESCRIPTION = "chat-conversational-react-description"

    STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION = (
        "structured-chat-zero-shot-react-description"
    )
    """An zero-shot react agent optimized for chat models.
    
    This agent is capable of invoking tools that have multiple inputs.
    """

    OPENAI_FUNCTIONS = "openai-functions"
    """An agent optimized for using open AI functions."""

    OPENAI_MULTI_FUNCTIONS = "openai-multi-functions"

实际上不同类型的Agent在处理任务时，对应着不同的Prompt模版（可以通过上面 AgentType 的定义或者打印Agent在解决问题的中间输出过程看到）。

下面我们结合代码示例的形式来具体讲解每种Agent。首先给出Agent的基础的可复用的代码，后面只需将LLM及Agent定义的代码替换即可：

# 几种主要的agent
import os
from langchain.agents import load_tools, initialize_agent, AgentType

# 搭建工具：serpai 可以实现在线搜索
# pip install google-search-results
# pip install numexpr  # llm-math工具需要安装

os.environ["SERPAPI_API_KEY"] = "xxxxxxxxxxxxx"
tools = load_tools(["serpapi", "llm-math"], llm=llm)

"""
替换为LLM及Agent的定义代码
"""

agent.run("请问上一任的美国总统是谁？他的年龄除以2的整数是多少？")

2.1 ZERO_SHOT_REACT_DESCRIPTION

ZERO_SHOT_REACT_DESCRIPTION 零样本增强生成型（LLM）Agent，即在没有示例的情况下可以自主地进行对话。

from langchain.llms import OpenAI

# 定义LLM
1lm = OpenAI(
    temperature=0,
    model="gpt-3.5-turbo-instruct",
)

# 定义Agent:ZERO_SHOT_REACT_DESCRIPTION
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

输出如下：

{'input'：'现在美国总统是谁？他的年龄除以2是多少？',
'output'：'Joe Biden is the current US president and his age divided by 2 is 39.0.'}

注意：不同类型的Agent在处理任务时，对应着不同的Prompt模版。ZERO_SHOT_REACT_DESCRIPTION 类型的Agent 对应的 Prompt 示例如下：

template='Answer the following questions as best you can. You have access to the following tools:\n\nCalculator(*args: Any, callbacks: Union[list[langchain_core.callbacks.base.BaseCallbackHandler], langchain_core.callbacks.base.BaseCallbackManager, NoneType] = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any - Useful for when you need to answer questions about math.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Calculator]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}')

2.2 CHAT_ZERO_SHOT_REACT_DESCRIPTION

Agent的 CHAT_ZERO_SHOT_REACT_DESCRIPTION 类型与 ZERO_SHOT_REACT_DESCRIPTION 类似，主要区别在于其大模型需要使用Chat Model。

from langchain.chat_models import ChatOpenAI

# 定义Chat Model
1lm = ChatOpenAI(
    temperature=0,
    model="gpt-3.5-turbo",
)

# 定义Agent:CHAT_ZERO_SHOT_REACT_DESCRIPTION
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

输出如下：

{'input'：'现在美国总统是谁？他的年龄除以2是多少？',
'output'：'Joe Biden, 40.5.'}

2.3 CONVERSATIONAL_REACT_DESCRIPTION

CONVERSATIONAL_REACT_DESCRIPTION 是一个对话型Agent，其使用的大模型是LLM模型，且需要和 memory 一起使用。

from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory

# 定义LLM
1lm = OpenAI(
    temperature=0,
    model="gpt-3.5-turbo-instruct",
)

# 定义Memory
memory = ConversationBufferMemory(
    memory_key="chat_history"
)

# 定义Agent:CONVERSATIONAL_REACT_DESCRIPTION
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory, 
    verbose=True
)

2.4 CHAT_CONVERSATIONAL_REACT_DESCRIPTION

CHAT_CONVERSATIONAL_REACT_DESCRIPTION 和 CONVERSATIONAL_REACT_DESCRIPTION类似，主要区别在于其使用Chat Model。

from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory

# 定义LLM
1lm = ChatOpenAI(
    temperature=0,
    model="gpt-3.5-turbo",
)

# 定义Memory
memory = ConversationBufferMemory(
    memory_key="chat_history"
)

# 定义Agent:CHAT_CONVERSATIONAL_REACT_DESCRIPTION
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory, 
    verbose=True
)

2.5 OPENAI_FUNCTIONS

OPENAI_FUNCTIONS 使用OpenAI的函数调用(function_call)实现，只支持OpenAI的模型。

from langchain.llms import OpenAI

# 定义LLM
1lm = OpenAI(
    temperature=0,
    model="gpt-3.5-turbo-instruct",
)

# 定义Agent:OPENAI_FUNCTIONS
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True
)

三、给Agent增加Memory

在这里插入图片描述
给Agent增加Memory的本质在于将Memory插入Agent的Prompt模板中。
代码示例如下：

from langchain.agents import (
    load_tools,
    AgentType,
    initialize_agent
)
from langchain.memory import ConversationBufferMemory
from langchain_ollama import OllamaLLM
from langchain.prompts import MessagesPlaceholder


# 定义LLM
llm = OllamaLLM(model="llama3.1:8b")

# 定义tools
tools = load_tools(["llm-math"], llm=llm)

# 定义Memory
memory = ConversationBufferMemory(
    memory_key="Memory",
    return_messages=True
)

# 定义Agent
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    agent_kwargs={  # 基于agent_kwargs传递参数，将memory_key传入提示词中
        "extra_prom_messages":[
            MessagesPlaceholder(variable_name="Memory"),
            MessagesPlaceholder(variable_name="agent_scratchpad")
        ]
    },
    memory=memory,
    verbose=True
)

# agent.run("你好，我是Mary。")
print(agent)

输出如下：

 memory=ConversationBufferMemory(chat_memory=InMemoryChatMessageHistory(messages=[]), return_messages=True, memory_key='Memory') verbose=True tags=['zero-shot-react-description'] agent=ZeroShotAgent(llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['agent_scratchpad', 'input'], input_types={}, partial_variables={}, template='Answer the following questions as best you can. You have access to the following tools:\n\nCalculator(*args: Any, callbacks: Union[list[langchain_core.callbacks.base.BaseCallbackHandler], langchain_core.callbacks.base.BaseCallbackManager, NoneType] = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any - Useful for when you need to answer questions about math.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Calculator]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}'), llm=OllamaLLM(model='llama3.1:8b'), output_parser=StrOutputParser(), llm_kwargs={}), output_parser=MRKLOutputParser(), allowed_tools=['Calculator']) tools=[Tool(name='Calculator', description='Useful for when you need to answer questions about math.', func=<bound method Chain.run of LLMMathChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='Translate a math problem into a expression that can be executed using Python\'s numexpr library. Use the output of running this code to answer the question.\n\nQuestion: ${{Question with math problem.}}\n```text\n${{single line mathematical expression that solves the problem}}\n```\n...numexpr.evaluate(text)...\n```output\n${{Output of running the code}}\n```\nAnswer: ${{Answer}}\n\nBegin.\n\nQuestion: What is 37593 * 67?\n```text\n37593 * 67\n```\n...numexpr.evaluate("37593 * 67")...\n```output\n2518731\n```\nAnswer: 2518731\n\nQuestion: 37593^(1/5)\n```text\n37593**(1/5)\n```\n...numexpr.evaluate("37593**(1/5)")...\n```output\n8.222831614237718\n```\nAnswer: 8.222831614237718\n\nQuestion: {question}\n'), llm=OllamaLLM(model='llama3.1:8b'), output_parser=StrOutputParser(), llm_kwargs={}))>, coroutine=<bound method Chain.arun of LLMMathChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='Translate a math problem into a expression that can be executed using Python\'s numexpr library. Use the output of running this code to answer the question.\n\nQuestion: ${{Question with math problem.}}\n```text\n${{single line mathematical expression that solves the problem}}\n```\n...numexpr.evaluate(text)...\n```output\n${{Output of running the code}}\n```\nAnswer: ${{Answer}}\n\nBegin.\n\nQuestion: What is 37593 * 67?\n```text\n37593 * 67\n```\n...numexpr.evaluate("37593 * 67")...\n```output\n2518731\n```\nAnswer: 2518731\n\nQuestion: 37593^(1/5)\n```text\n37593**(1/5)\n```\n...numexpr.evaluate("37593**(1/5)")...\n```output\n8.222831614237718\n```\nAnswer: 8.222831614237718\n\nQuestion: {question}\n'), llm=OllamaLLM(model='llama3.1:8b'), output_parser=StrOutputParser(), llm_kwargs={}))>)]

注意：给Agent增加 Memory 的关键在于基于agent_kwargs传递参数，将memory_key传入提示词中。

四、在Agent与Tool之间共享记忆

Tool 可以基于“只读”的形式读取Agent的Memory（不能写入），这主要是通过 langchain.memory 模块下的 ReadOnlySharedMemory 来实现的。
在这里插入图片描述

这里我们使用一个“总结文本链”，并基于 ReadOnlySharedMemory 来实现Tool 与 Agent 共享Memory。（注意：这里只给出了核心代码）

from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory, ReadOnlySharedMemory

# 定义Memory
memory = ConversationBufferMemory(
    memory_key="Memory",
    return_messages=True
)

# 只读Memory
readonMemory = ReadOnlySharedMemory(memory=memory)

# 定义Tool: 总结Chain
summary_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    memory=readonMemory,
    verbose=True
)

# 定义Agent
agent_chain = initialize_agent(
    tools,
    ltm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    handle_parsing_errors=True,
    memory=memory,
)

注意：这里只给出了核心代码