当前位置: 首页 > wzjs >正文

青海西宁高端网站建设百度快速排名提升

青海西宁高端网站建设,百度快速排名提升,建站卖素材,做兼职最好的网站有哪些简述 在自然语言处理(NLP)领域,Prompt工程是提升大语言模型(LLM)性能的重要技术。本文分析了一段用于优化服务提供商支持工单根因分析的Python代码,展示了如何通过自动化迭代优化Prompt来提高模型输出的准…

简述

在自然语言处理(NLP)领域,Prompt工程是提升大语言模型(LLM)性能的重要技术。本文分析了一段用于优化服务提供商支持工单根因分析的Python代码,展示了如何通过自动化迭代优化Prompt来提高模型输出的准确性与相关性。本文将从技术背景、创新点、取得的效果、代码优点来说明这一实用的工程化手段

技术背景

Prompt优化是近年来随着大型语言模型的广泛应用而兴起的一项技术。传统的NLP任务通常需要对模型进行微调(fine-tuning),但这需要大量标注数据和计算资源。而Prompt工程通过设计高质量的输入指令(Prompt),引导模型生成更符合预期的输出,显著降低了开发成本。文中代码的目标是优化一个用于总结工单根因的Prompt,通过结合句向量余弦相似度和链式推理(Chain-of-Thought, CoT)生成新的Prompt变异,逐步提升模型性能。

该代码依赖以下关键技术:

  1. SentenceTransformer:用于生成文本嵌入,计算预测总结与参考总结的余弦相似度,作为奖励函数。

  2. OpenAI API:调用gpt-4o-mini模型,生成总结或新的Prompt变异。

  3. 日志与进度管理:通过logging模块和JSONL文件记录优化过程,便于调试和恢复。

  4. 自动化Prompt变异:通过CoT生成新的Prompt变异,结合随机选择和评估策略优化Prompt。

创新点

  1. 自动化Prompt优化框架

    代码通过迭代生成和评估Prompt变异,实现了自动化的Prompt优化流程,避免了人工设计的低效性。使用CoT提示模型分析当前Prompt的不足并生成变异提示,增强了优化的针对性。
  2. 结合句向量与余弦相似度

    采用SentenceTransformer计算预测总结与参考总结的余弦相似度,作为奖励函数,量化Prompt性能。这种方法相比传统的手动评估更加客观且可扩展。
  3. 鲁棒的错误处理与进度保存

    代码在API调用、数据加载和Prompt变异等环节加入了多重错误处理机制(如指数退避重试),确保稳定性。使用JSONL格式保存每一步的优化进度,支持中断后恢复,适合长时间运行的优化任务。
  4. 早停机制与动态调整

    引入早停机制(max_no_improvement=3),在连续三次无改进时停止优化,节省计算资源。根据前一步是否改进动态选择是否调用CoT生成新变异,优化效率。

prompt优化代码

import json
import random
import reimport requests
import time
import os
import logging
from functools import lru_cache
from tqdm import tqdm
from sentence_transformers import SentenceTransformer, util# === 配置日志 ===
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)# === 模型调用配置 ===
base_url = "https:/xx
api_key = os.getenv("OPENAI_API_KEY", "xx")  # Use env variable for security
model = "gpt-4o-mini"  # Match deployment name# === 初始化 ===
try:embedding_model = SentenceTransformer("../model/all-MiniLM-L6-v2")logger.info("SentenceTransformer loaded successfully from local path")
except Exception as e:logger.error(f"Failed to load SentenceTransformer: {e}")logger.info("Please ensure the model is available at '../model/all-MiniLM-L6-v2'")raisedata_file = "final_tickets.jsonl"
progress_file = "prompt_progress.jsonl"  # JSONL for progress# === 保存进度 ===
def save_progress(best_prompt, best_score, step):try:with open(progress_file, "a", encoding="utf-8") as f:  # Append modejson.dump({"step": step, "best_prompt": best_prompt, "best_score": best_score}, f)f.write("\n")  # Add newline for JSONL formatf.flush()  # Ensure immediate writelogger.info(f"Appended progress to {progress_file} for step {step}")except Exception as e:logger.error(f"Error saving progress: {e}")# === 加载数据 ===
def load_data(file_path, limit=None):try:with open(file_path, "r", encoding="utf-8") as f:lines = [json.loads(line) for line in f]if limit:lines = lines[:limit]random.shuffle(lines)  # Shuffle for random train/validation splitlogger.info(f"Loaded {len(lines)} tickets from {file_path}")return linesexcept Exception as e:logger.error(f"Error loading data from {file_path}: {e}")raise# Split dataset into training and validation
try:dataset = load_data(data_file, limit=50)train_data = dataset[:40]  # 80% for trainingval_data = dataset[40:]    # 20% for validation
except Exception as e:logger.error(f"Failed to split dataset: {e}")raise# === 模型调用 ===
@lru_cache(maxsize=2000)  # Increased cache size
def call_model(prompt: str, ticket_content: str, retries=10):headers = {"Authorization": f"Bearer {api_key}","Content-Type": "application/json"}messages = [{"role": "system", "content": "You are a professional support engineer."},{"role": "user", "content":prompt + "\n\n ticket content:" + ticket_content}]payload = {"model": model,"messages": messages,"temperature": 0.3}for attempt in range(retries):try:response = requests.post(base_url, headers=headers, json=payload, timeout=30)response.raise_for_status()logger.info(f"Successful API call for prompt: {prompt[:50]}...")return response.json()["choices"][0]["message"]["content"].strip()except requests.exceptions.HTTPError as e:if e.response.status_code == 429:wait_time = min(15 * (2 ** attempt), 120)  # Exponential backoff for 429: 15s, 30s, 60s, 120slogger.warning(f"Rate limit hit (attempt {attempt + 1}/{retries}): Waiting {wait_time}s")time.sleep(wait_time)else:logger.error(f"HTTP error (attempt {attempt + 1}/{retries}): {e}")time.sleep(2 ** attempt)  # General error backoff: 1s, 2s, 4sexcept Exception as e:logger.error(f"Error calling model (attempt {attempt + 1}/{retries}): {e}")time.sleep(2 ** attempt)logger.error("Failed to get response after retries")return ""# === 生成新的 Prompt 变异 ===
def generate_prompt_variations(best_prompt, best_score, sample_ticket, candidate_score, retries=3):coT_prompt = (f"You are a prompt engineering expert. I’m optimizing a prompt for summarizing the root cause of support tickets. "f"The current prompt is: \"{best_prompt}\". Here’s a sample ticket: \"{sample_ticket[:500]}\". "f"The prompt’s average similarity score (cosine similarity of sentence embeddings) is {best_score:.4f}. "f"The latest candidate prompt scored {candidate_score:.4f}, which did not improve over the current best. following below requirements"f"1. Analyze why the current prompt and recent candidate may not be yielding higher similarity scores.\n"f"2. Suggest 3 new prompt variations (either rephrasings or additions) to improve clarity, conciseness, or relevance.\n"f"3. For each variation, explain why it might improve performance.\n"f"Return the response in JSON format:\n"f"{{\n  \"analysis\": \"...\",\n  \"variations\": [\n    {{\"type\": \"append|replace\", \"prompt\": \"...\", \"reason\": \"...\"}},\n    ...\n  ]\n}}")try:print('coT_prompt', coT_prompt)response = call_model(coT_prompt, "", retries=retries)print('response', response)response = re.search(r"```json(.*?)```j", response, re.DOTALL)print('response2', response)result = json.loads(response)logger.info("Generated new prompt variations via CoT")variations = result.get("variations")# Ensure variations have the correct structurevalid_variations = [(v["type"], v["prompt"]) for v in variationsif isinstance(v, dict) and v.get("type") in ["append", "replace"] and v.get("prompt")]return valid_variationsexcept Exception as e:logger.error(f"Error generating prompt variations: {e}")logger.info("Falling back to hardcoded variations")return None# === 奖励函数:句向量余弦相似度 ===
def compute_reward(predicted: str, reference: str):if not predicted or not reference:logger.warning("Empty predicted or reference summary; returning score 0.0")return 0.0try:emb_pred = embedding_model.encode(predicted, convert_to_tensor=True)emb_ref = embedding_model.encode(reference, convert_to_tensor=True)score = util.pytorch_cos_sim(emb_pred, emb_ref).item()return scoreexcept Exception as e:logger.error(f"Error computing reward: {e}")return 0.0# === Prompt 变异策略 ===
def mutate_prompt(prompt, sample_ticket, best_score, candidate_score):try:variations = generate_prompt_variations(prompt, best_score, sample_ticket, candidate_score)mutation_type, variation = random.choice(variations)if mutation_type == "append":new_prompt = prompt + " " + variationelif mutation_type == "replace":new_prompt = variationelse:new_prompt = promptlogger.info(f"Mutated prompt: {new_prompt[:50]}...")return new_promptexcept Exception as e:logger.error(f"Error mutating prompt: {e}")return prompt# === 评估当前 prompt ===
def evaluate_prompt(prompt, dataset, subset_size=3):  # Reduced to 3 to lower request ratetry:dataset_subset = random.sample(dataset, min(subset_size, len(dataset)))scores = []for item in tqdm(dataset_subset, desc="Evaluating prompt"):input_text = item.get("ticket_content", "")reference_summary = item.get("root_cause_summary", "")if not input_text or not reference_summary:logger.warning("Skipping ticket with missing content or summary")continuetry:predicted_summary = call_model(prompt, input_text)score = compute_reward(predicted_summary, reference_summary)scores.append(score)except Exception as e:logger.error(f"Error processing ticket: {e}")continuetime.sleep(5)  # Increased delay to avoid rate limitsavg_score = sum(scores) / len(scores) if scores else 0.0logger.info(f"Average score for prompt: {avg_score:.4f}")return avg_scoreexcept Exception as e:logger.error(f"Error evaluating prompt: {e}")return 0.0# === 初始 prompt ===
initial_prompt = ("You are a support engineer. Given the following ticket, ""summarize in the shortest possible sentence, beginning with \"The root cause of the issue is...\""
)# === 主循环:Prompt 优化 ===
try:best_prompt = initial_promptbest_score = evaluate_prompt(best_prompt, train_data)no_improvement_count = 0max_no_improvement = 3  # Early stopping thresholdnum_candidates = 1  # Reduced to 1 to lower request rate# Select a sample ticket for CoT variationssample_ticket = train_data[0].get("ticket_content", "") if train_data else ""for step in range(10):  # Max iterationslogger.info(f"\n🔄 Step {step + 1}")try:# Generate candidate prompt (use CoT if no improvement in previous step)if no_improvement_count > 0:logger.info("No improvement in previous step; generating new variations via CoT")candidate_prompts = [mutate_prompt(best_prompt, sample_ticket, best_score, candidate_scores[0] if candidate_scores else best_score)]else:candidate_prompts = [mutate_prompt(best_prompt, sample_ticket, best_score, best_score)]candidate_scores = [evaluate_prompt(p, train_data) for p in candidate_prompts]max_score = max(candidate_scores) if candidate_scores else 0.0max_idx = candidate_scores.index(max_score) if candidate_scores else 0logger.info(f"Candidate Scores: {[f'{s:.4f}' for s in candidate_scores]}")if max_score > best_score:logger.info(f"✅ Improved prompt! New Score: {max_score:.4f}")best_prompt = candidate_prompts[max_idx]best_score = max_scoreno_improvement_count = 0else:logger.info(f"❌ No improvement (Best Candidate Score: {max_score:.4f})")no_improvement_count += 1logger.info(f"Current best_prompt: {best_prompt[:50]}...")save_progress(best_prompt, best_score, step)except Exception as e:logger.error(f"Error in step {step + 1}: {e}")continueif no_improvement_count >= max_no_improvement:logger.info("🛑 Early stopping: No improvement for 3 consecutive steps")break# === 验证最终 prompt ===val_score = evaluate_prompt(best_prompt, val_data, subset_size=len(val_data))logger.info("\n🏁 Final Optimized Prompt:")logger.info(best_prompt)logger.info(f"🏆 Best Training Score: {best_score:.4f}")logger.info(f"📊 Validation Score: {val_score:.4f}")except KeyboardInterrupt:logger.info("KeyboardInterrupt detected; saving progress and exiting")save_progress(best_prompt, best_score, step)logger.info("\n🏁 Partial Results:")logger.info(f"Best Prompt: {best_prompt}")logger.info(f"Best Training Score: {best_score:.4f}")exit(0)
except Exception as e:logger.error(f"Unexpected error: {e}")save_progress(best_prompt, best_score, step)raise

运行

据代码日志输出,优化过程在训练集上逐步提升了Prompt的平均余弦相似度得分(best_score),最终在验证集上通过val_score验证了Prompt的泛化能力。具体效果包括:

  • 性能提升:通过多次迭代,Prompt的平均相似度得分从初始值逐步提高,表明生成的总结更接近参考答案。 如图上日志显示 ✅ Improved prompt! New Score: 0.5748

  • 可恢复性:通过progress_file保存每一步的最优Prompt和得分,便于在中断后继续优化。

  • 泛化能力:最终Prompt在验证集上的得分(val_score)反映了其在未见过数据上的表现,验证了优化的有效性。

执行结果

{"step": 0, "best_prompt": "You are a senior support engineer. Given the following ticket content including subject, description, requester, and message history, summarize the root cause in clear, natural, and technical language. Keep the explanation concise and professional.", "best_score": 0.0.5748027596473694}
{"step": 1, "best_prompt": "You are a senior support engineer. Given the following ticket content including subject, description, requester, and message history, summarize the root cause in clear, natural, and technical language. keep the explanation concise and professional.summarize in the shortest possible sentence, beginning with \"The root cause of the issue is...\"", "best_score": 0.6720715522766113}
{"step": 2, "best_prompt": "\"You are an experienced support engineer. Analyze the following ticket content, which includes the subject, description, requester, and message history. Provide a concise and precise summary of the root cause of the issue, using clear and technical language. Focus on identifying the underlying problem that led to the user's experience, ensuring that your summary aligns closely with the reference summary in terms of clarity and relevance.\"", "best_score": 0.690971531867981}

http://www.dtcms.com/wzjs/390210.html

相关文章:

  • 倒v是网站设置的还是作家自己网站设计软件
  • 网站策划的具体内容是什么刚刚传来最新消息
  • 手机网站免费制作学网络营销去哪个学校
  • 网站制作建设哪家公司好sem优化托管公司
  • 动画网站模板惠州seo网站排名
  • 国学底蕴的公司名字广州seo排名优化服务
  • 石家庄建设集团网站厦门网络营销推广
  • asp网站建设教程百度助手app免费下载
  • 武汉制作手机网站google入口
  • 无锡网站制作需要多少钱百度企业官网
  • 啥也不懂怎么建设网站网站优化建议
  • 网站策划书3000内容企业推广
  • 杭州网站搭建搜索引擎seo优化怎么做
  • 长沙银狐做网站郑州疫情最新情况
  • 利用wps做网站宁波seo公司排名
  • 做企业公司网站搜狗广告联盟
  • 电脑上自己做科目一的网站策划品牌全案
  • 西安网页设计工资seo排名优化哪家好
  • 网站源码如何优化企业网站推广效果指标分析
  • 建设网站费用记什么科目网站建设方案开发
  • 做内贸的有哪些网站免费的个人网页
  • java做网站开发的流程电商sem是什么意思
  • 用模板做的网站不好优化百度联盟官网登录入口
  • 杭州有做网站免费的域名和网站
  • 网站所有者是什么意思查询网站流量的网址
  • 驻马店住房和城乡建设部网站企业营销策划包括哪些内容
  • 公众号中做微网站怎么做属于自己的网站
  • 邯郸市建设局官方网站营业推广的目标通常是
  • 如何优化网站图片申请网址怎么申请的
  • google网站打不开了怎么接广告推广