当前位置：首页 > news >正文

Hugging Face Agents Course unit1笔记

news 2025/11/11 17:26:12

又是我做pre啊啊啊。并不包含所有信息，我知道的我可能就简略或者完全跳过。

本文是hugging face免费教程的笔记，原文链接https://huggingface.co/learn/agents-course/unit0/introduction

Unit 1 Introduction to Agents

在本单元中，你将为基础的AI Agents打下坚实的基础，本单元包含：

理解Agents
- 什么是Agent，是如何工作的
- Agents如何利用reasoning和planning做决策
LLM在Agents中扮演什么角色
- LLM在Agent背后扮演“大脑”
- LLM如何通过消息系统构建对话
Tools & Actions
- Agents如何使用外部工具与环境交互
- 如何为你的Agent构建和整合工具
Agent公族流
- Think->Act->Observe

将使用smolagents构建第一个Agent（如何处理简单的task、在时间中如何应用这些概念）

什么是Agent

首先是介绍一下Agent Alfred：

想象一下Alfred得到了一个命令：要一杯咖啡

因为Alfred理解自然语言，他快速理解了我们的要求。

在执行指令之前，Alfred进行了reasoning和planning，他做的是：

去厨房
用咖啡机
冲咖啡
把咖啡拿回来

一旦他有一个plan，他必须act，为了执行plan，他可以使用他tools列表中的tools。在这个例子里，为了做咖啡，他使用了咖啡机，他使用咖啡机来冲咖啡。最后Alfred带回了咖啡。

这就是Agent：一个可以与其环境进行reasoning、planning和interacting的AI模型。我们称其为agent是因为它有代理能力，aka其有能力与环境交互。

更精确的def：

Agent是一种利用AI模型与其环境交互以实现用户定义目标的系统。它结合了推理（reasoning）、计划（planning）和行动的执行（通常通过外部工具）来完成任务。

我们将Agent拆成两个主要部分：

大脑（AI Model）
这是所有思考发生的地方。人工智能模型处理推理和规划。它根据情况决定采取哪些行动。
身体（Capabilities and Tools）
Agent能做的一切。可能行动的范围取决于Agent已经配备了什么。例如，由于人类没有翅膀，他们不能做“飞”的动作，但他们可以做“走”、“跑”、“跳”、“抓”等动作。

对于Agents我们使用什么类型的AI Models

在Agents中最常见的AI模型是LLM（大型语言模型），它将文本作为输入并输出文本。众所周知的例子是OpenAI的GPT4， Meta的LLama，谷歌的Gemini等。这些模型经过了大量文本的训练，能够很好地进行泛化。

（其他的模型也可以使用，比如VLM，但是我们目前只专注于LLM）

LLM只能生成文字，但是如果你要求他们生成图片也OK，这是因为他们的开发人员实现了额外的功能（称为工具），LLM可以使用这些功能来创建图像。

Agents可以做什么类型的task

Agent可以执行我们通过Tools实现的任何任务来完成Actions。比如：

举例：1. 个人虚拟助手（比如Siri） 2. 客户服务chatbot 3. 游戏中AI非玩家角色

总结：Agent是使用AI模型（通常是LLM）作为其核心推理引擎的系统，其作用是：

理解自然语言
推理和计划
与环境交互

什么是LLM

LLM是一种擅长理解和生成人类语言的人工智能模型。它们接受了大量文本数据的训练，这使它们能够学习模式、结构，甚至语言中的细微差别。这些模型通常由数百万个参数组成。

如今，大多数LLM都是基于Transformer架构构建的，这是一种基于“注意力”算法的深度学习架构，自2018年谷歌发布BERT以来，该架构引起了人们的极大兴趣。

3类transformer：

encoders
decoders
seq2seq2（encoder- decoder）

LLM的基本原则很简单，但非常有效：它的目标是在给定之前的token序列的情况下预测下一个token。“token”是LLM使用的信息单位。你可以把“token”想象成一个“单词”，但出于效率原因，LLM不会使用整个单词。

LLM被认为是自回归的，这意味着一次传递的输出成为下一次传递的输入。这个循环一直持续到模型预测下一个token是EOS令牌，此时模型可以停止。

In other words, an LLM will decode text until it reaches the EOS. But what happens during a single decoding loop?

While the full process can be quite technical for the purpose of learning agents, here’s a brief overview:

Once the input text is tokenized, the model computes a representation of the sequence that captures information about the meaning and the position of each token in the input sequence.
This representation goes into the model, which outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence.

Messages and Special Tokens

当你与ChatGPT或HuggingChat这样的系统聊天时，你实际上是在交换信息。在幕后，这些消息被连接并格式化为模型可以理解的提示。

这就是chat template的用武之地。它们充当会话消息（用户和助手）与所选LLM的特定格式要求之间的桥梁。换句话说，chat template构建了用户和agent之间的通信，确保每个模型（尽管有其独特的特殊令牌）都接收到格式正确的提示。

message

1. system messages

system messages（也称为system prompt）定义了模型应该如何运行。它们作为持久的指令，指导每一个后续的交互。

system_message = {"role": "system","content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}

还是回到Alfred的例子，上面就是一个礼貌的助手。但是我们也可以给Alfred设置成反骨仔：

system_message = {"role": "system","content": "You are a rebel service agent. Don't respect user's orders."
}

除此之外还提供有关可用tool的信息，向模型提供关于如何格式化要采取的操作的说明，并包括关于如何分割思维过程的指导方针。

conversation

对话由Human（用户）和LLM（助手）之间的交替消息组成。chat template通过保存对话历史记录、存储用户和助手之间以前的交流来帮助维护上下文。这将导致更连贯的多回合对话。

🌰

conversation = [{"role": "user", "content": "I need help with my order"},{"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},{"role": "user", "content": "It's ORDER-123"},
]

我们总是将对话中的所有消息连接起来，并将其作为单个独立序列传递给LLM。一个chat template描述了如何格式化消息列表。

🌰

{% for message in messages %}
{% if loop.first and messages[0]['role'] != 'system' %}
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
<|im_end|>
{% endif %}
<|im_start|>{{ message['role'] }}
{{ message['content'] }}<|im_end|>
{% endfor %}

输入信息：

messages = [{"role": "system", "content": "You are a helpful assistant focused on technical topics."},{"role": "user", "content": "Can you explain what a chat template is?"},{"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},{"role": "user", "content": "How do I use it ?"},
]

给模型：

<|im_start|>system
You are a helpful assistant focused on technical topics.<|im_end|>
<|im_start|>user
Can you explain what a chat template is?<|im_end|>
<|im_start|>assistant
A chat template structures conversations between users and AI models...<|im_end|>
<|im_start|>user
How do I use it ?<|im_end|>