当前位置：首页 > news >正文

17_FastMCP 2.x 中文文档之FastMCP服务端高级功能：LLM采样详解

news 2025/11/10 9:34:34

一、LLM 采样

通过 MCP 上下文从客户端或配置的提供程序请求 LLM 文本生成。

版本 2.0.0 新增：

LLM 采样允许 MCP 工具基于提供的消息请求 LLM 文本生成。默认情况下，采样请求会发送到客户端的 LLM，但您也可以配置回退处理程序或始终使用特定的 LLM 提供程序。当工具需要利用 LLM 能力来处理数据、生成响应或执行基于文本的分析时，这非常有用。

二、为什么要使用 LLM 采样？

LLM 采样使工具能够：

利用 AI 能力：使用客户端的 LLM 进行文本生成和分析
卸载复杂推理：让 LLM 处理需要自然语言理解的任务
生成动态内容：基于数据创建响应、摘要或转换
保持上下文：使用用户已经在交互的相同 LLM 实例

2.1 基本用法

使用 ctx.sample() 从客户端的 LLM 请求文本生成：

from fastmcp import FastMCP, Contextmcp = FastMCP("SamplingDemo")@mcp.tool
async def analyze_sentiment(text: str, ctx: Context) -> dict:"""使用客户端的 LLM 分析文本情感。"""prompt = f"""分析以下文本的情感为正面、负面或中性。只输出一个单词 - 'positive'、'negative' 或 'neutral'。要分析的文本：{text}"""# 请求 LLM 分析response = await ctx.sample(prompt)# 处理 LLM 的响应sentiment = response.text.strip().lower()# 映射到标准情感值if "positive" in sentiment:sentiment = "positive"elif "negative" in sentiment:sentiment = "negative"else:sentiment = "neutral"return {"text": text, "sentiment": sentiment}

2.2 方法签名

上下文采样方法 (ctx.sample，异步方法)
从客户端的 LLM 请求文本生成

参数：

messages (str | list[str | SamplingMessage])：发送给 LLM 的字符串或字符串/消息对象列表
system_prompt (str | None，默认为 None)：指导 LLM 行为的可选系统提示
temperature (float | None，默认为 None)：可选采样温度（控制随机性，通常为 0.0-1.0）
max_tokens (int | None，默认为 512)：可选生成的最大令牌数
model_preferences (ModelPreferences | str | list[str] | None，默认为 None)：可选模型选择偏好（例如模型提示字符串、提示列表或 ModelPreferences 对象）
响应：
response (TextContent | ImageContent)：LLM 的响应内容（通常是带有 .text 属性的 TextContent）

三、简单文本生成

3.1 基本提示

使用简单的字符串提示生成文本：

@mcp.tool
async def generate_summary(content: str, ctx: Context) -> str:"""生成所提供内容的摘要。"""prompt = f"请提供以下内容的简明摘要：\n\n{content}"response = await ctx.sample(prompt)return response.text

3.2 系统提示

使用系统提示来指导 LLM 的行为：

@mcp.tool
async def generate_code_example(concept: str, ctx: Context) -> str:"""为给定概念生成 Python 代码示例。"""response = await ctx.sample(messages=f"编写一个简单的 Python 代码示例演示 '{concept}'。",system_prompt="您是一名专业的 Python 程序员。提供简洁、可工作的代码示例，无需解释。",temperature=0.7,max_tokens=300)code_example = response.textreturn f"```python\n{code_example}\n```"

3.3 模型偏好

为不同用例指定模型偏好：

@mcp.tool
async def creative_writing(topic: str, ctx: Context) -> str:"""使用特定模型生成创意内容。"""response = await ctx.sample(messages=f"写一个关于 {topic} 的创意短篇故事",model_preferences="claude-3-sonnet",  # 偏好特定模型include_context="thisServer",  # 使用服务器的上下文temperature=0.9,  # 高创意性max_tokens=1000)return response.text@mcp.tool
async def technical_analysis(data: str, ctx: Context) -> str:"""使用推理导向模型执行技术分析。"""response = await ctx.sample(messages=f"分析此技术数据并提供见解：{data}",model_preferences=["claude-3-opus", "gpt-4"],  # 偏好推理模型temperature=0.2,  # 低随机性以确保一致性max_tokens=800)return response.text

3.4 复杂消息结构

使用结构化消息进行更复杂的交互：

from fastmcp.client.sampling import SamplingMessage@mcp.tool
async def multi_turn_analysis(user_query: str, context_data: str, ctx: Context) -> str:"""使用多轮对话结构执行分析。"""messages = [SamplingMessage(role="user", content=f"我有这些数据：{context_data}"),SamplingMessage(role="assistant", content="我可以看到您的数据。您希望我分析什么？"),SamplingMessage(role="user", content=user_query)]response = await ctx.sample(messages=messages,system_prompt="您是一名数据分析师。基于对话上下文提供详细见解。",temperature=0.3)return response.text

四、采样回退处理程序

客户端对采样的支持是可选的。如果客户端不支持采样，服务器将报告错误，指示客户端不支持采样。

但是，您可以向 FastMCP 服务器提供 sampling_handler，它将采样请求直接发送到 LLM 提供程序，而不是通过客户端路由。

sampling_handler_behavior 参数控制何时使用此处理程序：

“fallback”（默认）：仅当客户端不支持采样时使用处理程序。请求首先发送到客户端，如果需要则回退到处理程序。
“always”：始终使用处理程序，完全绕过客户端。当您希望完全控制用于采样的 LLM 时很有用。

采样处理程序可以使用任何 LLM 提供程序实现，但提供了一个 OpenAI 的示例实现作为 Contrib 模块。采样缺乏典型 LLM 补全的全部能力。因此，指向第三方提供程序的 OpenAI 兼容 API 的 OpenAI 采样处理程序通常足以实现采样处理程序。

4.1 回退模式（默认）

仅当客户端不支持采样时使用处理程序：

import asyncio
import osfrom mcp.types import ContentBlock
from openai import OpenAIfrom fastmcp import FastMCP
from fastmcp.experimental.sampling.handlers.openai import OpenAISamplingHandler
from fastmcp.server.context import Contextasync def async_main():server = FastMCP(name="OpenAI 采样回退示例",sampling_handler=OpenAISamplingHandler(default_model="gpt-4o-mini",client=OpenAI(api_key=os.getenv("API_KEY"),base_url=os.getenv("BASE_URL"),),),sampling_handler_behavior="fallback",  # 默认 - 仅在客户端不支持采样时使用)@server.toolasync def test_sample_fallback(ctx: Context) -> ContentBlock:# 如果可用，将使用客户端的 LLM，否则回退到处理程序return await ctx.sample(messages=["hello world!"],)await server.run_http_async()if __name__ == "__main__":asyncio.run(async_main())

4.2 始终模式

始终使用处理程序，绕过客户端：

server = FastMCP(name="服务器控制的采样",sampling_handler=OpenAISamplingHandler(default_model="gpt-4o-mini",client=OpenAI(api_key=os.getenv("API_KEY")),),sampling_handler_behavior="always",  # 始终使用处理程序，从不使用客户端
)@server.tool
async def analyze_data(data: str, ctx: Context) -> str:# 将始终使用服务器配置的 LLM，而不是客户端的result = await ctx.sample(messages=f"分析此数据：{data}",system_prompt="您是一名数据分析师。",)return result.text