当前位置：首页 > news >正文

解决提示词痛点：用AI智能体自动检测矛盾、优化格式的完整方案

news 2025/8/2 16:32:42

本文介绍了一个基于用户意图进行提示词优化的项目，该项目能够将预期用途与理想模型进行精确匹配。这种多智能体解决方案通过自动化处理，显著提升了提示词优化的可扩展性，有效减少了人工干预，特别适用于复杂的少样本学习场景。

近期，Andreessen Horowitz将研究定义为生成式AI的变革性应用场景，这一观点在OpenAI和xAI等主要技术提供商对深度研究领域的投资增长和战略聚焦中得到了充分体现。

考虑到研究推理任务通常具有运行时间长、计算成本高的特点，用户查询的精确性和与预期目标的一致性变得至关重要。为确保系统效率，歧义性问题需要在流程早期阶段得到有效解决。

针对这一挑战，OpenAI已将提示词优化技术集成到ChatGPT系统中。该系统采用智能体架构，利用成本效益更高的模型（如o4-mini）在启动深度研究任务之前完成查询的歧义消除和优化处理。这种方法通过确保输出与用户意图的高度对齐，显著提升了整体研究体验的质量。

OpenAI在其深度研究API中同样应用了类似的优化策略，通过部署o3-deep-research和o4-mini-deep-research等专用模型执行多步骤调查任务，在保证准确性的同时优化了执行效率。

这种技术演进的核心驱动力在于一个具有广泛影响力的应用场景——生成式AI在高级研究领域的深度应用已引起了业界的普遍关注。在技术实现层面，我们正见证着多模型编排技术的实际部署。现代系统不再依赖单一模型的处理能力，而是通过集成和协调多个专用模型来实现最优结果。这一趋势与NVIDIA提出的AI发展愿景高度一致，即通过编排小语言模型(SLMs)来构建未来的AI系统，其中每个模型都针对特定任务进行专门优化，以实现效率和性能的双重提升。

从模型到SDK的技术融合

当前，模型提供商正在将其服务范围扩展至高级命令行界面(CLI)领域，同时推动模型与软件开发工具包(SDK)的深度融合。OpenAI最近发布了一个综合性项目，该项目展示了三个关键技术领域的交汇点：提示词优化技术、多智能体编排架构，以及模型与用例的精确匹配策略。

最佳模型选择机制

系统采用OpenAI评估框架对提示词性能进行量化评估。评估过程基于20个精心标注的示例数据集，每个示例都包含了原始消息内容、开发者提示词、用户与助手的交互记录以及预期的修改方案。这些示例涵盖了多种常见问题类型，包括逻辑矛盾、少样本不一致性和格式歧义等典型场景。

系统通过Python字符串检查评分器执行评估流程，根据准确性、成本和处理速度等多维度指标调整智能体指令参数，并选择最优模型（如示例中的

o3

模型）。这种方法确保了系统能够准确识别并解决所有黄金输出中的问题，从而实现高质量的提示词优化效果。

核心提示词优化功能

提示词优化构成了系统的核心功能模块。该模块专门检测提示词中的常见问题，包括指令中的逻辑矛盾、格式规范的不清晰或缺失（特别是针对JSON或CSV等结构化输出），以及提示词规则与少样本示例之间的不一致性。系统在识别这些问题后，会自动重写提示词以修复相关缺陷，同时确保原始意图的完整保留。

此外，系统还具备根据一致性要求更新少样本示例的能力。实际应用中的解决方案包括添加明确的输出格式说明部分，或重新生成助手响应以确保一致性标准的达成。

多智能体协作架构实现

该项目通过基于Agents SDK的结构化工作流展示了多智能体协作的技术实现。系统部署了多个专用智能体，包括Dev-Contradiction-Checker（开发矛盾检查器）、Format-Checker（格式检查器）、Few-Shot-Consistency-Checker（少样本一致性检查器）、Dev-Rewriter（开发重写器）和Few-Shot-Rewriter（少样本重写器），这些智能体通过并行执行机制提升了系统的整体处理效率。

在工作流程中，检查器组件负责同时识别各类问题，而重写器组件则根据检测结果有条件地激活并执行相应的修复操作。整个协作过程通过Pydantic数据模型进行通信，确保了结构化输出的一致性和可靠性。这种协作架构体现了OpenAI Playground优化功能早期版本的设计理念，并为构建可扩展智能体系统提供了最佳实践参考。

系统架构概述

该优化系统采用多智能体协作方法，通过专用智能体之间的协同工作完成提示词的分析和重写任务。系统能够自动识别并处理多种常见问题类型，包括提示词指令中的矛盾、格式规范的缺失或不明确，以及提示词与少样本示例之间的不一致性。

系统实现基于OpenAI SDK与Evals框架的集成，构建了OpenAI提示词优化系统的早期技术原型。

系统运行需要以下技术组件：

openai

Python包、

openai-agents

包，以及在环境变量中配置的OpenAI API密钥（

OPENAI_API_KEY

）。

提示词优化系统采用协作式多智能体架构来执行提示词分析和改进任务。每个智能体都专门负责检测或重写特定类型的问题：

Dev-Contradiction-Checker（开发矛盾检查器）

该组件负责扫描提示词中的逻辑矛盾或不可能执行的指令。例如，它能够识别同一提示词中同时出现"仅使用正数"和"包含负数示例"这类相互冲突的要求。

Format-Checker（格式检查器）

该智能体专门识别提示词需要结构化输出（如JSON、CSV或Markdown格式）但未能明确指定格式要求的情况。该组件确保所有必要的字段、数据类型和格式规则都得到明确定义，从而避免输出格式的模糊性。

Few-Shot-Consistency-Checker（少样本一致性检查器）

该组件通过检查示例对话来验证助手响应是否真正遵循提示词中指定的规则。它能够捕获提示词要求与实际示例演示之间的不匹配情况，确保示例的规范性和一致性。

Dev-Rewriter（开发重写器）

在问题识别完成后，该智能体负责重写提示词以解决矛盾并澄清格式规范，同时确保原始意图的完整保留。重写过程遵循严格的逻辑规则，确保修改的有效性和准确性。

Few-Shot-Rewriter（少样本重写器）

该组件负责更新不一致的示例响应，使其与提示词中的规则保持对齐，确保所有示例都能正确符合更新后的开发者提示词要求。

通过这些智能体的协同工作，系统能够系统性地识别和修复提示词中的各类问题，实现高质量的自动化优化效果。

智能体间结构化数据交换机制

虽然智能体的输入和输出通常呈现非结构化特征，但通过在智能体之间实现结构化数据流，系统能够释放显著的优化潜力。为实现这一目标，系统采用Pydantic模型来为智能体的输入和输出定义精确的格式规范。这些模型不仅强制执行数据验证规则，还在整个工作流程中维护一致性标准，从而有效减少错误并提升处理效率。

智能体指令设计的最佳实践

构建高效智能体系统需要在指令设计中遵循以下核心原则：

精确的范围界定

每个智能体都应被限制在特定且边界明确的功能角色内。以矛盾检查器为例，其任务被明确定义为识别"真正的自相矛盾"，同时澄清"重叠或冗余并不构成矛盾"，这种明确的范围界定有助于保持智能体的专注度和执行效率。

系统化的逐步指导

智能体指令应当提供逻辑清晰的顺序化处理流程。格式检查器的设计exemplifies了这一原则，它首先对任务类型进行分类，然后再评估具体的格式规范，这种有序的分析方法确保了处理过程的系统性和可靠性。

关键概念的明确定义

通过预先定义关键概念来消除指令中的模糊性是确保智能体准确执行的重要措施。少样本一致性检查器配备了全面的"合规性评分标准"，该标准详细阐述了合规性的判定条件，为准确评估提供了明确的指导框架。

明确的边界设定和排除条件

通过明确指定智能体的非职责范围来防范功能范围的无序扩展。少样本检查器包含了详细的"范围外"条目清单，例如忽略次要的文体变化，这种设计有效最小化了误报的发生概率。

严格的输出结构规范

系统要求所有智能体都必须遵循一致的响应格式，并提供完整的输出示例作为参考。这种跨智能体的标准化设计促进了多智能体处理管道中的无缝集成和高效协作。

通过将这些最佳实践融入智能体设计中，系统中的各个智能体变得更加可靠且具备良好的协作能力，从而增强了整体提示词优化系统的性能表现。后续章节将提供各智能体的完整定义和详细指令说明。

OpenAI评估仪表板

下图展示了OpenAI仪表板中的评估功能模块。通过执行相关代码（位于本文末尾），评估结果将被自动填充到仪表板中，为测试过程提供直观的可视化展示。这一功能的主要目标是实现提示词的自动优化并确定最佳匹配模型。

用户可以通过点击具体行项来查看详细评分信息，包括推理过程和评分器配置选项，为深入分析提供了便利的操作界面。

技术实现代码

以下代码来源于OpenAI官方存储库，已在Google Colab环境中验证可行性。可以直接复制Python代码并在Jupyter笔记本环境中执行。

 pip install openai-agents
pip install openai################################ Import required modules
from openai import AsyncOpenAI
import asyncio
import json
import os
from enum import Enum
from typing import Any, List, Dict
from pydantic import BaseModel, Field
from agents import Agent, Runner, set_default_openai_client, traceopenai_client: AsyncOpenAI | None = Nonedef _get_openai_client() -> AsyncOpenAI:global openai_clientif openai_client is None:openai_client = AsyncOpenAI(api_key=os.environ.get("OPENAI_API_KEY", "Your API Key"),)return openai_clientset_default_openai_client(_get_openai_client())##################################class Role(str, Enum):"""Role enum for chat messages."""user = "user"assistant = "assistant"class ChatMessage(BaseModel):"""Single chat message used in few-shot examples."""role: Rolecontent: strclass Issues(BaseModel):"""Structured output returned by checkers."""has_issues: boolissues: List[str]@classmethoddef no_issues(cls) -> "Issues":return cls(has_issues=False, issues=[])class FewShotIssues(Issues):"""Output for few-shot contradiction detector including optional rewrite suggestions."""rewrite_suggestions: List[str] = Field(default_factory=list)@classmethoddef no_issues(cls) -> "FewShotIssues":return cls(has_issues=False, issues=[], rewrite_suggestions=[])class MessagesOutput(BaseModel):"""Structured output returned by `rewrite_messages_agent`."""messages: list[ChatMessage]class DevRewriteOutput(BaseModel):"""Rewriter returns the cleaned-up developer prompt."""new_developer_message: str##################################dev_contradiction_checker = Agent(name="contradiction_detector",model="gpt-4.1",output_type=Issues,instructions="""You are **Dev-Contradiction-Checker**.GoalDetect *genuine* self-contradictions or impossibilities **inside** the developer prompt supplied in the variable `DEVELOPER_MESSAGE`.Definitions• A contradiction = two clauses that cannot both be followed.• Overlaps or redundancies in the DEVELOPER_MESSAGE are *not* contradictions.What you MUST do1. Compare every imperative / prohibition against all others.2. List at most FIVE contradictions (each as ONE bullet).3. If no contradiction exists, say so.Output format (**strict JSON**)Return **only** an object that matches the `Issues` schema:```json{"has_issues": <bool>,"issues": ["<bullet 1>","<bullet 2>"]}- has_issues = true IFF the issues array is non-empty.- Do not add extra keys, comments or markdown.
""",
)
format_checker = Agent(name="format_checker",model="gpt-4.1",output_type=Issues,instructions="""You are Format-Checker.TaskDecide whether the developer prompt requires a structured output (JSON/CSV/XML/Markdown table, etc.).If so, flag any missing or unclear aspects of that format.StepsCategorise the task as:a. "conversation_only", orb. "structured_output_required".For case (b):- Point out absent fields, ambiguous data types, unspecified ordering, or missing error-handling.Do NOT invent issues if unsure. be a little bit more conservative in flagging format issuesOutput formatReturn strictly-valid JSON following the Issues schema:{"has_issues": <bool>,"issues": ["<desc 1>", "..."]}Maximum five issues. No extra keys or text.
""",
)
fewshot_consistency_checker = Agent(name="fewshot_consistency_checker",model="gpt-4.1",output_type=FewShotIssues,instructions="""You are FewShot-Consistency-Checker.GoalFind conflicts between the DEVELOPER_MESSAGE rules and the accompanying **assistant** examples.USER_EXAMPLES:      <all user lines>          # context onlyASSISTANT_EXAMPLES: <all assistant lines>     # to be evaluatedMethodExtract key constraints from DEVELOPER_MESSAGE:- Tone / style- Forbidden or mandated content- Output format requirementsCompliance Rubric - read carefullyEvaluate only what the developer message makes explicit.Objective constraints you must check when present:- Required output type syntax (e.g., "JSON object", "single sentence", "subject line").- Hard limits (length ≤ N chars, language required to be English, forbidden words, etc.).- Mandatory tokens or fields the developer explicitly names.Out-of-scope (DO NOT FLAG):- Whether the reply "sounds generic", "repeats the prompt", or "fully reflects the user's request" - unless the developer text explicitly demands those qualities.- Creative style, marketing quality, or depth of content unless stated.- Minor stylistic choices (capitalisation, punctuation) that do not violate an explicit rule.Pass/Fail rule- If an assistant reply satisfies all objective constraints, it is compliant, even if you personally find it bland or loosely related.- Only record an issue when a concrete, quoted rule is broken.Empty assistant list ⇒ immediately return has_issues=false.For each assistant example:- USER_EXAMPLES are for context only; never use them to judge compliance.- Judge each assistant reply solely against the explicit constraints you extracted from the developer message.- If a reply breaks a specific, quoted rule, add a line explaining which rule it breaks.- Optionally, suggest a rewrite in one short sentence (add to rewrite_suggestions).- If you are uncertain, do not flag an issue.- Be conservative—uncertain or ambiguous cases are not issues.be a little bit more conservative in flagging few shot contradiction issuesOutput formatReturn JSON matching FewShotIssues:{"has_issues": <bool>,"issues": ["<explanation 1>", "..."],"rewrite_suggestions": ["<suggestion 1>", "..."] // may be []}List max five items for both arrays.Provide empty arrays when none.No markdown, no extra keys.""",
)
dev_rewriter = Agent(name="dev_rewriter",model="gpt-4.1",output_type=DevRewriteOutput,instructions="""You are Dev-Rewriter.You receive:- ORIGINAL_DEVELOPER_MESSAGE- CONTRADICTION_ISSUES (may be empty)- FORMAT_ISSUES (may be empty)Rewrite rulesPreserve the original intent and capabilities.Resolve each contradiction:- Keep the clause that preserves the message intent; remove/merge the conflicting one.If FORMAT_ISSUES is non-empty:- Append a new section titled ## Output Format that clearly defines the schema or gives an explicit example.Do NOT change few-shot examples.Do NOT add new policies or scope.Output format (strict JSON){"new_developer_message": "<full rewritten text>"}No other keys, no markdown.
""",
)
fewshot_rewriter = Agent(name="fewshot_rewriter",model="gpt-4.1",output_type=MessagesOutput,instructions="""You are FewShot-Rewriter.Input payload- NEW_DEVELOPER_MESSAGE (already optimized)- ORIGINAL_MESSAGES (list of user/assistant dicts)- FEW_SHOT_ISSUES (non-empty)TaskRegenerate only the assistant parts that were flagged.User messages must remain identical.Every regenerated assistant reply MUST comply with NEW_DEVELOPER_MESSAGE.After regenerating each assistant reply, verify:- It matches NEW_DEVELOPER_MESSAGE. ENSURE THAT THIS IS TRUE.Output formatReturn strict JSON that matches the MessagesOutput schema:{"messages": [{"role": "user", "content": "..."},{"role": "assistant", "content": "..."}]}Guidelines- Preserve original ordering and total count.- If a message was unproblematic, copy it unchanged.""",
)###############################[{"focus": "contradiction_issues","input_payload": {"developer_message": "Always answer in **English**.\nNunca respondas en inglés.","messages": [{"role": "user","content": "¿Qué hora es?"}]},"golden_output": {"changes": True,"new_developer_message": "Always answer **in English**.","new_messages": [{"role": "user","content": "¿Qué hora es?"}],"contradiction_issues": "Developer message simultaneously insists on English and forbids it.","few_shot_contradiction_issues": "","format_issues": "","general_improvements": ""}},{"focus": "few_shot_contradiction_issues","input_payload": {"developer_message": "Respond with **only 'yes' or 'no'** – no explanations.","messages": [{"role": "user","content": "Is the sky blue?"},{"role": "assistant","content": "Yes, because wavelengths …"},{"role": "user","content": "Is water wet?"},{"role": "assistant","content": "Yes."}]},"golden_output": {"changes": True,"new_developer_message": "Respond with **only** the single word \"yes\" or \"no\".","new_messages": [{"role": "user","content": "Is the sky blue?"},{"role": "assistant","content": "yes"},{"role": "user","content": "Is water wet?"},{"role": "assistant","content": "yes"}],"contradiction_issues": "","few_shot_contradiction_issues": "Assistant examples include explanations despite instruction not to.","format_issues": "","general_improvements": ""}}
]###############################

输出如下：

 [{'focus': 'contradiction_issues','input_payload': {'developer_message': 'Always answer in **English**.\nNunca respondas en inglés.','messages': [{'role': 'user', 'content': '¿Qué hora es?'}]},'golden_output': {'changes': True,'new_developer_message': 'Always answer **in English**.','new_messages': [{'role': 'user', 'content': '¿Qué hora es?'}],'contradiction_issues': 'Developer message simultaneously insists on English and forbids it.','few_shot_contradiction_issues': '','format_issues': '','general_improvements': ''}},{'focus': 'few_shot_contradiction_issues','input_payload': {'developer_message': "Respond with **only 'yes' or 'no'** – no explanations.",'messages': [{'role': 'user', 'content': 'Is the sky blue?'},{'role': 'assistant', 'content': 'Yes, because wavelengths …'},{'role': 'user', 'content': 'Is water wet?'},{'role': 'assistant', 'content': 'Yes.'}]},'golden_output': {'changes': True,'new_developer_message': 'Respond with **only** the single word "yes" or "no".','new_messages': [{'role': 'user', 'content': 'Is the sky blue?'},{'role': 'assistant', 'content': 'yes'},{'role': 'user', 'content': 'Is water wet?'},{'role': 'assistant', 'content': 'yes'}],'contradiction_issues': '','few_shot_contradiction_issues': 'Assistant examples include explanations despite instruction not to.','format_issues': '','general_improvements': ''}}]

以下案例展示了系统如何处理包含矛盾指令的提示词：

 def _normalize_messages(messages: List[Any]) -> List[Dict[str, str]]:"""Convert list of pydantic message models to JSON-serializable dicts."""result = []for m in messages:if hasattr(m, "model_dump"):result.append(m.model_dump())elif isinstance(m, dict) and "role" in m and "content" in m:result.append({"role": str(m["role"]), "content": str(m["content"])})return resultasync def optimize_prompt_parallel(developer_message: str,messages: List["ChatMessage"],
) -> Dict[str, Any]:"""Runs contradiction, format, and few-shot checkers in parallel,then rewrites the prompt/examples if needed.Returns a unified dict suitable for an API or endpoint."""with trace("optimize_prompt_workflow"):# 1. Run all checkers in parallel (contradiction, format, fewshot if there are examples)tasks = [Runner.run(dev_contradiction_checker, developer_message),Runner.run(format_checker, developer_message),]if messages:fs_input = {"DEVELOPER_MESSAGE": developer_message,"USER_EXAMPLES": [m.content for m in messages if m.role == "user"],"ASSISTANT_EXAMPLES": [m.content for m in messages if m.role == "assistant"],}tasks.append(Runner.run(fewshot_consistency_checker, json.dumps(fs_input)))results = await asyncio.gather(*tasks)# Unpack resultscd_issues: Issues = results[0].final_outputfi_issues: Issues = results[1].final_outputfs_issues: FewShotIssues = results[2].final_output if messages else FewShotIssues.no_issues()# 3. Rewrites as neededfinal_prompt = developer_messageif cd_issues.has_issues or fi_issues.has_issues:pr_input = {"ORIGINAL_DEVELOPER_MESSAGE": developer_message,"CONTRADICTION_ISSUES": cd_issues.model_dump(),"FORMAT_ISSUES": fi_issues.model_dump(),}pr_res = await Runner.run(dev_rewriter, json.dumps(pr_input))final_prompt = pr_res.final_output.new_developer_messagefinal_messages: list[ChatMessage] | list[dict[str, str]] = messagesif fs_issues.has_issues:mr_input = {"NEW_DEVELOPER_MESSAGE": final_prompt,"ORIGINAL_MESSAGES": _normalize_messages(messages),"FEW_SHOT_ISSUES": fs_issues.model_dump(),}mr_res = await Runner.run(fewshot_rewriter, json.dumps(mr_input))final_messages = mr_res.final_output.messagesreturn {"changes": True,"new_developer_message": final_prompt,"new_messages": _normalize_messages(final_messages),"contradiction_issues": "\n".join(cd_issues.issues),"few_shot_contradiction_issues": "\n".join(fs_issues.issues),"format_issues": "\n".join(fi_issues.issues),}#######################################async def example_contradiction():# A prompt with contradictory instructionsprompt = """Quick-Start Card — Product ParserGoal  
Digest raw HTML of an e-commerce product detail page and emit **concise, minified JSON** describing the item.**Required fields:**  
name | brand | sku | price.value | price.currency | images[] | sizes[] | materials[] | care_instructions | features[]**Extraction priority:**  
1. schema.org/JSON-LD blocks  
2. <meta> & microdata tags  
3. Visible DOM fallback (class hints: "product-name", "price")** Rules:**  
- If *any* required field is missing, short-circuit with: `{"error": "FIELD_MISSING:<field>"}`.
- Prices: Numeric with dot decimal; strip non-digits (e.g., "1.299,00 EUR" → 1299.00 + "EUR").
- Deduplicate images differing only by query string. Keep ≤10 best-res.
- Sizes: Ensure unit tag ("EU", "US") and ascending sort.
- Materials: Title-case and collapse synonyms (e.g., "polyester 100%" → "Polyester").**Sample skeleton (minified):**
```json
{"name":"","brand":"","sku":"","price":{"value":0,"currency":"USD"},"images":[""],"sizes":[],"materials":[],"care_instructions":"","features":[]}
Note: It is acceptable to output null for any missing field instead of an error ###"""result = await optimize_prompt_parallel(prompt, [])# Display the resultsif result["contradiction_issues"]:print("Contradiction issues:")print(result["contradiction_issues"])print()print("Optimized prompt:")print(result["new_developer_message"])# Run the exampleawait example_contradiction()

执行结果显示，系统成功识别出指令中的逻辑矛盾：“指令要求如果任何必需字段缺失，系统必须短路并返回带有字段名的错误，但随后与此矛盾地声明对于任何缺失字段输出null而不是错误是可以接受的。这两个要求不能同时遵循。”

 Contradiction issues:
The instructions mandate that if any required field is missing, the system must short-circuit and return an error with the field name (e.g., {"error": "FIELD_MISSING:<field>"}), but then contradict this by stating that it is acceptable to output null for any missing field instead of an error. These two requirements cannot both be followed.Optimized prompt:
Quick-Start Card — Product ParserGoal  
Digest raw HTML of an e-commerce product detail page and emit **concise, minified JSON** describing the item.**Required fields:**  
name | brand | sku | price.value | price.currency | images[] | sizes[] | materials[] | care_instructions | features[]**Extraction priority:**  
1. schema.org/JSON-LD blocks  
2. <meta> & microdata tags  
3. Visible DOM fallback (class hints: "product-name", "price")**Rules:**  
- If any required field is missing, short-circuit with: {"error": "FIELD_MISSING:<field>"} and do not return a JSON skeleton.
- Prices: Numeric with dot decimal; strip non-digits (e.g., "1.299,00 EUR" → 1299.00 + "EUR").
- Deduplicate images that differ only by query string. Output up to 10 unique best-resolution images (URLs as strings).
- sizes[]: List of objects. Each object must have a "value" (string or number) and a "unit" (e.g., "EU", "US") property. Sort ascending by value.
- materials[]: List of strings. Each value should be title-cased and common synonyms should be collapsed (e.g., "polyester 100%" → "Polyester").
- care_instructions: String. If absent, trigger missing field error.
- features[]: List of strings. Each element should be a concise attribute or bullet-point feature.## Output FormatIf ALL required fields are present, output a minified JSON object with this shape:{"name":"string","brand":"string","sku":"string","price":{"value":number,"currency":"string"},"images":["string"],"sizes":[{"value":string|number,"unit":"string"}],"materials":["string"],"care_instructions":"string","features":["string"]}If ANY required field is missing, output:{"error": "FIELD_MISSING:<field>"}

系统自动生成的优化版本消除了这一矛盾，确保了指令的逻辑一致性

以下案例演示了系统如何处理少样本示例与提示词要求不一致的情况：

 async def example_fewshot_fix():prompt = "Respond **only** with JSON using keys `city` (string) and `population` (integer)."messages = [{"role": "user", "content": "Largest US city?"},{"role": "assistant", "content": "New York City"},{"role": "user", "content": "Largest UK city?"},{"role": "assistant", "content": "{\"city\":\"London\",\"population\":9541000}"}]print("Few-shot examples before optimization:")print(f"User: {messages[0]['content']}")print(f"Assistant: {messages[1]['content']}")print(f"User: {messages[2]['content']}")print(f"Assistant: {messages[3]['content']}")print()# Call the optimization APIresult = await optimize_prompt_parallel(prompt, [ChatMessage(**m) for m in messages])# Display the resultsif result["few_shot_contradiction_issues"]:print("Inconsistency found:", result["few_shot_contradiction_issues"])print()# Show the optimized few-shot examplesoptimized_messages = result["new_messages"]print("Few-shot examples after optimization:")print(f"User: {optimized_messages[0]['content']}")print(f"Assistant: {optimized_messages[1]['content']}")print(f"User: {optimized_messages[2]['content']}")print(f"Assistant: {optimized_messages[3]['content']}")# Run the exampleawait example_fewshot_fix()

执行结果：

 Few-shot examples before optimization:
User: Largest US city?
Assistant: New York City
User: Largest UK city?
Assistant: {"city":"London","population":9541000}Inconsistency found: The first assistant example does not use JSON or include both `city` and `population` keys as required by 'Respond **only** with JSON using keys `city` (string) and `population` (integer).'Few-shot examples after optimization:
User: Largest US city?
Assistant: {"city":"New York City","population":8419000}
User: Largest UK city?Assistant: {"city":"London","population":9541000}

以下案例展示了系统如何处理格式规范不明确的提示词：

 async def example_format_issue():# A prompt with unclear or inconsistent formatting instructionsprompt = """Task → Translate dense patent claims into 200-word lay summaries with a glossary.Operating Steps:
1. Split the claim at semicolons, "wherein", or numbered sub-clauses.
2. For each chunk:a) Identify its purpose.b) Replace technical nouns with everyday analogies.c) Keep quantitative limits intact (e.g., "≥150 C").
3. Flag uncommon science terms with asterisks, and later define them.
4. Re-assemble into a flowing paragraph; do **not** broaden or narrow the claim’s scope.
5. Omit boilerplate if its removal does not alter legal meaning.Output should follow a Markdown template:
- A summary section.
- A glossary section with the marked terms and their definitions.Corner Cases:
- If the claim is over 5 kB, respond with CLAIM_TOO_LARGE.
- If claim text is already plain English, skip glossary and state no complex terms detected.Remember: You are *not* providing legal advice—this is for internal comprehension only."""# Call the optimization API to check for format issuesresult = await optimize_prompt_parallel(prompt, [])# Display the resultsif result.get("format_issues"):print("Format issues found:", result["format_issues"])print()print("Optimized prompt:")print(result["new_developer_message"])# Run the exampleawait example_format_issue(）

执行结果显示，系统识别出多个格式相关问题并提供了优化解决方案：

 Format issues found: Output format requires Markdown sections for summary and glossary, but formatting instructions for Markdown are implicit, not explicitly defined (e.g., should sections use headers?).
No template or example given for section titles or glossary formatting, which could lead to inconsistency across outputs.
How to handle glossary entries for terms with multiple asterisks or same term appearing multiple times is not specified.
No instruction on what to do if the input is exactly at 5 kB: is that CLAIM_TOO_LARGE or permissible?
Ambiguous handling if no glossary terms are detected: should the glossary section be omitted or included with a placeholder statement?Optimized prompt:
Task → Translate dense patent claims into 200-word lay summaries with a glossary.Operating Steps:
1. Split the claim at semicolons, "wherein", or numbered sub-clauses.
2. For each chunk:a) Identify its purpose.b) Replace technical nouns with everyday analogies.c) Keep quantitative limits intact (e.g., "≥150 C").
3. Flag uncommon science terms with asterisks, and later define them in a glossary.
4. Re-assemble into a flowing paragraph; do **not** broaden or narrow the claim’s scope.
5. Omit boilerplate if its removal does not alter legal meaning.Output constraints:
- If the claim text exceeds 5 kB (greater than 5,120 characters), respond with CLAIM_TOO_LARGE.
- If the claim text is already in plain English, skip the glossary and state no complex terms detected.Remember: You are *not* providing legal advice—this is for internal comprehension only.## Output Format
Produce your output in Markdown, structured as follows:### Summary
A 200-word layperson summary generated as described above.### Glossary
A bullet list of all unique asterisk-marked terms from the summary. For each, provide a concise definition suitable for a non-expert. If a term appears multiple times, include it only once. If no terms are marked, include the message: "No complex or technical terms were detected; no glossary necessary."#### Example Output### Summary
[Concise lay summary here, with marked technical terms like *photolithography* and *substrate*.]### Glossary
- *photolithography*: A process that uses light to transfer patterns onto a surface.
- *substrate*: The base layer or material on which something is built.If no terms warrant inclusion:### GlossaryNo complex or technical terms were detected; no glossary necessary.

系统生成的优化版本包含了详细的输出格式规范。