当前位置：首页 > news >正文

工单分类微调训练运维管理工具原型

news 2025/10/13 9:55:49

简述

需求进展

之前，我尝试用Longformer模型来训练工单分类系统，但问题很快就暴露出来：Longformer训练时间长得让人抓狂，每次训练只能针对一个租户的数据，无法快速适配多个租户的需求。

切换一个使用相同标签的租户还能够保持较高的准确率

但是切换到一个使用客户自定义issue type和priority的租户就出现问题了

更糟的是，当客户提交紧急工单时，系统因为训练耗时过长，根本无法及时响应，客户满意度直线下降。这让我们意识到，必须找到一个更高效、更灵活的解决方案。

于是我开启了一场技术冒险，目标是打造一个智能工单分类系统（Ticket Triage AI），不仅能快速微调模型以服务多个租户，还能实时响应用户需求。我们选择了Unsloth来加速模型微调，结合其他技术，构建了一个高效、用户友好的系统

从需求到落地的技术冒险

我们的目标是打造一个既能训练AI模型，又能实时部署和预测的工单分类系统。整个开发过程就像一场技术冒险：我们需要选择合适的工具、设计直观的界面、解决实时日志和端口冲突等问题，还要让系统既强大又易用。以下是我们如何一步步实现这个目标的旅程。

需求拆解，明确蓝图

我们希望系统能完成以下任务：

模型训练：用户上传包含工单数据的JSON文件，系统自动加载并微调一个大语言模型（如Qwen2-7B），生成可用于预测的检查点（checkpoint）。
实时日志：训练过程中，实时显示进度和损失值，让用户随时了解模型状态。
模型部署：从生成的检查点中选择一个，快速部署为FastAPI服务，避免端口冲突。
工单预测：输入工单标题和描述，系统自动预测分类（如工单类型、优先级等），并支持用户提供的候选选项。
用户友好：界面清晰，训练和部署分开布局，操作简单，即使是非技术人员也能轻松上手。

这些需求看似简单，实现起来却需要协调多种技术解决一系列实际问题。

集成的技术

为了实现这个系统，我们调用了一支“技术全明星阵容”，每种技术都在冒险中扮演了关键角色：

Gradio：直观的交互界面Gradio是我们打造用户界面的利器。它简单易用，能快速生成Web界面，让用户通过浏览器上传文件、调整参数、查看日志和预测结果。我们使用了Gradio的Blocks API，通过Row和Column设计了一个左右布局：左边是训练专区，右边是部署和预测专区，清晰直观，像一个精心设计的控制台。
Unsloth与Transformers：高效微调大模型我们选择了Qwen2-7B作为基础模型，结合Unsloth和Hugging Face的Transformers库进行高效微调。Unsloth的LoRA（低秩适配）技术让我们能在有限的GPU资源下快速微调模型，节省内存的同时保持性能。Transformers的SFTTrainer则负责训练过程，确保模型能从工单数据中学习分类逻辑。
FastAPI：实时预测的API后端FastAPI是我们部署模型的“高速公路”。它轻量高效，支持异步请求，非常适合将微调后的模型快速转为在线预测服务。我们通过FastAPI提供/predict端点，接受工单标题和描述，返回分类结果和置信度。
SentenceTransformers：精准匹配候选选项对于用户提供的候选选项（比如工单类型可能的值），我们用SentenceTransformers的all-MiniLM-L6-v2模型计算语义相似度，确保模型预测的结果与候选选项最匹配。这种“后处理”机制让分类更贴合实际业务需求。
Psutil与Socket：解决端口冲突部署模型时，端口冲突是个大麻烦。我们引入psutil来检测和终止占用8000端口的进程，用socket验证端口是否可用，确保每次部署都能顺利启动FastAPI服务。
Threading与异步日志：实时反馈训练过程可能耗时较长，我们使用Python的threading模块在后台运行训练，同时通过监控trainer_log.jsonl文件，实时提取训练进度（如步数和损失值），通过Gradio的yield机制动态更新界面，让用户感觉像在看一场“直播”。

这些技术就像拼图的每一块组合在一起，产出一个简洁的工单分单工具。

模型训练和评估

训练代码

按照一般的方式做模型训练和评估

import json
import os
import torch
from datasets import Dataset
from unsloth import FastLanguageModel, is_bfloat16_supported
from transformers import TrainingArguments
from trl import SFTTrainer# ✅ Step 1: 加载和格式化数据
def load_and_format_data(path):def format_example(sample):user_msg = f"""
Title: {sample['title']}
Description: {sample['description']}Predict the following fields:
- ticket_type
- issue_type
- subissue_type
- priority_name
- queue_id_name
- billing_codes
"""assistant_msg = f"""
ticket_type: {sample['ticket_type']}
issue_type: {sample['issue_type']}
subissue_type: {sample['subissue_type']}
priority_name: {sample['priority_name']}
queue_id_name: {sample['queue_id_name']}
billing_codes: {sample['billing_codes']}
"""return {"input": user_msg.strip(), "output": assistant_msg.strip()}with open(path, 'r', encoding='utf-8') as f:data = json.load(f)formatted = [format_example(x) for x in data]return Dataset.from_list(formatted)max_seq_length = 8096
lora_rank = 16
dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16
load_in_4bit = True
model_path = '/opt/chenrui/qwq32b/base_model/qwen2-7b'def load_or_create_peft_model(model_path_or_checkpoint: str,max_seq_length: int = 2048,dtype=None,load_in_4bit: bool = False,lora_config: dict = None,
):assert os.path.exists(model_path_or_checkpoint), f"路径不存在：{model_path_or_checkpoint}"print(f"Loading model from {model_path_or_checkpoint}")try:if torch.cuda.is_available():torch.cuda.empty_cache()model, tokenizer = FastLanguageModel.from_pretrained(model_name=model_path_or_checkpoint,max_seq_length=max_seq_length,dtype=dtype,load_in_4bit=load_in_4bit,)print(f"Model loaded, parameters: {sum(p.numel() for p in model.parameters())}")except Exception as e:print(f"Failed to load model: {e}")raiseadapter_config_path = os.path.join(model_path_or_checkpoint, "adapter_config.json")if os.path.exists(adapter_config_path):print(f"检测到 LoRA adapter: {adapter_config_path}")else:print("未检测到 LoRA adapter，初始化新 LoRA...")assert lora_config is not None, "首次训练需提供 lora_config"try:model = FastLanguageModel.get_peft_model(model,r=lora_config.get("r", 16),target_modules=lora_config.get("target_modules", ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"]),lora_alpha=lora_config.get("lora_alpha", 32),lora_dropout=lora_config.get("lora_dropout", 0),bias=lora_config.get("bias", "none"),use_gradient_checkpointing=lora_config.get("use_gradient_checkpointing", "unsloth"),random_state=lora_config.get("random_state", 42),use_rslora=lora_config.get("use_rslora", False),loftq_config=lora_config.get("loftq_config", None),)print(f"LoRA adapter initialized, parameters: {sum(p.numel() for p in model.parameters())}")except Exception as e:print(f"Failed to initialize LoRA: {e}")raisereturn model, tokenizerlora_config = {"r": lora_rank,"lora_alpha": lora_rank*2,"lora_dropout": 0,"target_modules": ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"],"use_gradient_checkpointing": "unsloth","bias": "none","use_rslora": False,"loftq_config": None
}model, tokenizer = load_or_create_peft_model(model_path_or_checkpoint=model_path,max_seq_length=max_seq_length,dtype=dtype,load_in_4bit=load_in_4bit,lora_config=lora_config,
)# ✅ Step 3: 加载数据集
dataset = load_and_format_data("adit_net_au.json")
print(f"Dataset loaded, size: {len(dataset)}")  # 应为 58,000def formatting_prompts_func(examples):system_prompt = """
You are an intelligent Ticket Triage AI Agent designed to assist Managed Service Providers (MSPs) in efficiently managing and prioritizing support tickets. Your goal is to improve ticket handling speed, accuracy, and overall client satisfaction.**Core Responsibilities**:
- Analyze the ticket's **title** and **description** to understand user intent, system context, and impact scope.
- Prioritize technical accuracy, business logic, and enterprise IT norms over superficial keyword matching.
- Avoid speculation, hallucination, or subjective interpretation. Resolve ambiguity by focusing on actionable intent and system behavior.
- Output **exactly one** classification for requested fields in the format (eg, ticket_type: <ticket_type_value>)
"""texts = []for input_text, output_text in zip(examples["input"], examples["output"]):messages = [{"role": "system", "content": system_prompt},{"role": "user", "content": input_text},{"role": "assistant", "content": output_text}]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)texts.append(prompt)return texts# ✅ Step 4: 构建 Trainer
trainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",formatting_func=formatting_prompts_func,max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=16,gradient_accumulation_steps=5,  # 有效 batch size = 100learning_rate=2e-4,lr_scheduler_type="linear",num_train_epochs=1,warmup_ratio=0.03,logging_steps=50,save_steps=200,save_total_limit=1,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),optim="adamw_8bit",weight_decay=0.01,seed=3407,max_steps=1000,output_dir="outputs",resume_from_checkpoint=True,report_to="none",dataloader_num_workers=1,)
)# ✅ Step 5: 开始训练
trainer.train()

步骤 1：加载和格式化数据 (load_and_format_data)

作用：读取JSON文件（adit_net_au.json）中的工单数据，并将其格式化为适合训练的结构。
实现方式：
- 打开JSON文件，加载工单数据（如标题、描述、工单类型等）。
- 对每条工单，生成一个用户提示，包含标题和描述，要求模型预测字段（如ticket_type、issue_type等）。
- 将预期输出格式化为结构化文本（例如：ticket_type: Incident）。
- 将格式化后的数据转换为Hugging Face的Dataset对象，包含input（输入提示）和output（预期输出）字段。
关键输出：一个包含58,000条样本的数据集，每条样本都有输入提示和预期输出。

步骤 2：设置模型配置

作用：定义模型和训练的全局参数。
细节：
- max_seq_length = 8096：设置最大输入序列长度为8096个token，能处理长描述的工单。
- lora_rank = 16：设置LoRA（低秩适配）的秩为16，优化内存使用。
- dtype = torch.bfloat16（或torch.float16）：如果支持，使用bfloat16加速计算，否则用float16。
- load_in_4bit = True：启用4位量化，减少内存占用，适合Qwen2-7B这样的大型模型。
- model_path：指定预训练模型路径为/opt/chenrui/qwq32b/base_model/qwen2-7b。

步骤 3：加载或创建PEFT模型 (load_or_create_peft_model)

作用：加载Qwen2-7B模型，并应用LoRA进行高效微调。
实现方式：
- 检查模型路径是否存在，并清空GPU缓存以释放内存。
- 使用FastLanguageModel.from_pretrained加载模型和分词器，启用4位量化和指定数据类型。
- 检查是否存在LoRA适配器（adapter_config.json）。如果没有，初始化新的LoRA适配器，设置：
  - r=16：低秩矩阵大小，优化效率。
  - target_modules：对特定Transformer层（如q_proj、k_proj）应用LoRA。
  - lora_alpha=32、lora_dropout=0：配置LoRA的缩放和丢弃率。
  - use_gradient_checkpointing="unsloth"：降低训练时的内存占用。
- 记录模型参数数量和任何错误信息。
关键输出：一个准备好微调的模型和分词器，已应用LoRA（如果需要）。

步骤 4：加载数据集

作用：准备工单数据集用于训练。
实现方式：
- 调用load_and_format_data加载adit_net_au.json。
- 打印数据集大小（58,000条样本）以验证。
关键输出：一个格式化的Dataset对象，准备好供训练器使用。

步骤 5：格式化提示 (formatting_prompts_func)

作用：将数据集样本转换为模型能理解的训练提示。
实现方式：
- 定义系统提示，指导模型作为工单分类AI，强调准确性和业务逻辑。
- 对每条样本，生成聊天风格的提示，包括：
  - 系统消息：工单分类AI的指令。
  - 用户消息：工单标题、描述和要预测的字段。
  - 助手消息：预期输出（如ticket_type: Incident）。
- 使用分词器的apply_chat_template将消息格式化为单条文本。
关键输出：一组格式化的提示文本，用于训练。

步骤 6：构建训练器 (SFTTrainer)

作用：配置训练过程，包括模型、数据集和超参数。
实现方式：
- 初始化SFTTrainer（监督微调训练器），设置：
  - model和tokenizer：Qwen2-7B模型及其分词器。
  - train_dataset：格式化的数据集。
  - dataset_text_field="text"：指定包含格式化提示的字段。
  - formatting_func：使用formatting_prompts_func格式化数据。
  - max_seq_length=8096：与模型序列长度匹配。
  - dataset_num_proc=2：使用2个CPU核心加速数据预处理。
- 配置TrainingArguments：
  - per_device_train_batch_size=16、gradient_accumulation_steps=5：有效批量大小为80（16×5），平衡内存和训练速度。
  - learning_rate=2e-4：控制模型学习速度。
  - num_train_epochs=1：数据集训练一次。
  - warmup_ratio=0.03：训练开始时逐渐增加学习率。
  - logging_steps=50、save_steps=200：每50步记录进度，每200步保存检查点。
  - save_total_limit=1：仅保留最新检查点，节省磁盘空间。
  - fp16或bf16：根据硬件支持使用混合精度。
  - optim="adamw_8bit"：使用8位Adam优化器，提升效率。
  - output_dir="outputs"：检查点保存到outputs目录。
  - resume_from_checkpoint=True：支持从上次检查点恢复训练。
关键输出：一个配置好的训练器，准备微调模型。

步骤 7：开始训练 (trainer.train)

作用：执行微调过程。
实现方式：
- 运行训练循环，训练1000步（max_steps=1000）。
- 根据数据集更新模型权重，优化工单分类任务。
- 每200步保存检查点到outputs目录。
- 每50步记录训练进度（如损失值）到控制台或日志文件。
关键输出：微调后的模型，以检查点形式保存在outputs目录，供后续部署使用。

模型服务化

import json
import json
import torch
import reimport uvicorn
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from unsloth import FastLanguageModel
from sentence_transformers import SentenceTransformer, util
from typing import Dict, List
import numpy as npapp = FastAPI()model = './outputs/checkpoint-1000'
load_in_4bit = True
max_seq_length = 8096# Load the trained Qwen-7B model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(model_name=model,max_seq_length=max_seq_length,dtype=None,load_in_4bit=load_in_4bit,
)
model.eval()
FastLanguageModel.for_inference(model)# Load the sentence transformer model for similarity comparison
similarity_model = SentenceTransformer('all-MiniLM-L6-v2')# Define input schema
class TicketInput(BaseModel):title: strdescription: strclass Choices(BaseModel):ticket_type: List[str] = Noneissue_type: List[str] = Nonesubissue_type: List[str] = Nonepriority_name: List[str] = Nonequeue_id_name: List[str] = Nonebilling_codes: List[str] = Noneclass PredictionRequest(BaseModel):input: TicketInputchoices: Choices# System message for the model
system_prompt = """
You are an intelligent Ticket Triage AI Agent designed to assist Managed Service Providers (MSPs) in efficiently managing and prioritizing support tickets. Your goal is to improve ticket handling speed, accuracy, and overall client satisfaction.**Core Responsibilities**:
- Analyze the ticket's **title** and **description** to understand user intent, system context, and impact scope.
- Prioritize technical accuracy, business logic, and enterprise IT norms over superficial keyword matching.
- Avoid speculation, hallucination, or subjective interpretation. Resolve ambiguity by focusing on actionable intent and system behavior.
- Output **exactly one** classification for requested fields in the format (eg:ticket_type: <ticket_type_value>)
"""# Format input for the model with clear instruction for JSON output
def format_input(title: str, description: str) -> str:return f"""
<|startoftext|>System: {system_prompt}User: Title: {title}
Description: {description}Predict the following fields:
- ticket_type
- issue_type
- subissue_type
- priority_name
- queue_id_name
- billing_codesAssistant: """def parse_output(output):# Split the output to isolate the assistant responseassistant_response = output.split("Assistant: ")[1].strip()# Parse the assistant response into a dictionarypredictions = {}clean_key = Nonefor line in assistant_response.split("\n"):if line.strip():  # Ensure the line is not emptyif ':' in line:  # Handle colon formatkey, value = line.split(':', 1)  # Split at the first colonclean_key = key.split('. ', 1)[-1].strip()  # Remove numeric prefixpredictions[clean_key] = value.strip()elif line.startswith('- '):  # Handle hyphen formatvalue = line.split('- ', 1)[-1].strip()if clean_key:  # Ensure there's a valid clean_keypredictions[clean_key] = valueelse:  # Handle keys without hyphens or colonskey_parts = line.split('. ')if len(key_parts) > 1:clean_key = key_parts[1].strip()return predictions# Parse model output to extract field predictions
def parse_model_output(output: str) -> Dict[str, str]:try:# Extract the assistant responsepredictions = parse_output(output)print('predictions', predictions)# Validate expected keysexpected_keys = {"ticket_type", "issue_type", "subissue_type", "priority_name", "queue_id_name","billing_codes"}if not all(key in predictions for key in expected_keys):raise ValueError("Missing expected keys in model output")return predictionsexcept json.JSONDecodeError as e:raise ValueError(f"Failed to parse model output as JSON: {e}")except Exception as e:raise ValueError(f"Error parsing model output: {e}")# Compute confidence score for a single field# Compute similarity and select the best choice with confidence
def get_best_choice(predicted: str, choices: List[str], similarity_model) -> tuple[str, float]:if not choices:return predicted, Nonepredicted_embedding = similarity_model.encode(predicted, convert_to_tensor=True)choice_embeddings = similarity_model.encode(choices, convert_to_tensor=True)similarities = util.cos_sim(predicted_embedding, choice_embeddings)[0]best_idx = similarities.argmax().item()confidence = similarities[best_idx].item() * 100  # Scale to 0-100confidence = min(max(confidence, 0.0), 100.0)print(f'predicted:{predicted}, best choice:{choices[best_idx]}')return choices[best_idx], round(confidence, 5)  # Format to 5 decimal places@app.post("/predict")
async def predict(request: PredictionRequest):try:# Format input promptprompt = format_input(request.input.title, request.input.description)# Tokenize and generateinputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=8096).to("cuda")with torch.no_grad():outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)print('generated_text', generated_text)# Parse the assistant responsepredictions = parse_model_output(generated_text)# Initialize output with predictions and confidencesresult = {"predictions": {}, "confidences": {}}choices_dict = request.choices.dict(exclude_none=True)# Compute confidence for each fieldfor field in predictions:if field in choices_dict and choices_dict[field]:# Use similarity model for fields with choicespredicted_value, confidence = get_best_choice(predictions[field], choices_dict[field], similarity_model)result["predictions"][field] = predicted_valueresult["confidences"][field] = confidence* 0.9 if confidence is not None else 0.0else:# Use LLM output directly and compute confidenceresult["predictions"][field] = predictions[field]result["confidences"][field] = 0.9return resultexcept ValueError as e:raise HTTPException(status_code=500, detail=str(e))except Exception as e:raise HTTPException(status_code=500, detail=f"Unexpected error: {str(e)}")if __name__ == "__main__":uvicorn.run(app, host="0.0.0.0", port=8000)

SentenceTransformers进行语义匹配，接受工单标题和描述，预测分类字段（如工单类型、优先级等），并支持用户提供的候选选项。以下是代码的逐步解释：

1. 初始化与模型加载

导入库：
- FastAPI：用于构建高效的API服务。
- Unsloth：加载微调后的Qwen2-7B模型，支持4位量化以节省内存。
- SentenceTransformer：使用all-MiniLM-L6-v2模型计算语义相似度，匹配预测结果与候选选项。
- torch：处理模型推理的张量操作。
- 其他库（json, numpy, pydantic等）用于数据处理和API输入验证。
模型配置：
- model_path = './outputs/checkpoint-1000'：指定微调后的模型检查点路径。
- load_in_4bit = True：启用4位量化，降低内存需求。
- max_seq_length = 8096：设置最大输入序列长度，适配长描述。
模型加载：
- 使用FastLanguageModel.from_pretrained加载Qwen2-7B模型和分词器，设置为推理模式（model.eval()和for_inference）。
- 加载SentenceTransformer模型（all-MiniLM-L6-v2）用于语义相似度计算。
输出：准备好用于推理的模型和分词器，以及语义匹配模型。

2. 定义API输入结构

Pydantic模型：
- TicketInput：定义输入的标题（title）和描述（description）。
- Choices：定义可选的候选选项（如ticket_type: ["Incident", "Request"]），支持六个字段（ticket_type, issue_type, subissue_type, priority_name, queue_id_name, billing_codes）。
- PredictionRequest：组合TicketInput和Choices，作为API的输入格式。
作用：确保API接收的数据结构清晰，自动验证输入格式。

3. 系统提示与输入格式化 (format_input)

作用：生成模型的输入提示，确保模型理解任务。
实现：
- 定义system_prompt，指导模型作为工单分类AI，强调准确性和业务逻辑，避免猜测。
- format_input函数将标题和描述格式化为包含系统提示、用户输入（标题+描述）和预测字段的文本，格式为：
  <|startoftext|>System: <system_prompt>
  User: Title: <title>
  Description: <description>
  Predict the following fields:
  - ticket_type
  - issue_type
  ...
  Assistant:
输出：格式化的提示文本，供模型推理。

4. 解析模型输出 (parse_output 和 parse_model_output)

作用：从模型生成的文本中提取分类结果。
实现：
- parse_output：
  - 提取“Assistant:”后的响应部分。
  - 逐行解析，处理包含冒号（:）或连字符（-）的行，提取字段名（如ticket_type）和值（如Incident）。
  - 处理可能出现的格式变体（如带编号的行）。
  - 返回字段名和值的字典。
- parse_model_output：
  - 调用parse_output解析响应。
  - 验证是否包含所有预期字段（ticket_type, issue_type等）。
  - 捕获JSON解析错误或缺失字段，抛出异常。
输出：一个包含分类结果的字典（如{"ticket_type": "Incident", ...}）。

5. 语义相似度匹配 (get_best_choice)

作用：将模型预测结果与用户提供的候选选项进行语义匹配，输出最接近的选项和置信度。
实现：
- 如果没有候选选项，直接返回模型预测值和None置信度。
- 使用SentenceTransformer将预测值和候选选项编码为嵌入向量。
- 计算预测值与候选选项的余弦相似度（util.cos_sim）。
- 选择相似度最高的候选选项，计算置信度（0-100%）。
输出：最匹配的候选值和置信度（如("Incident", 95.12345)）。

6. FastAPI预测端点 (/predict)

作用：处理POST请求，接受工单标题、描述和候选选项，返回分类结果和置信度。
实现：
- 接收PredictionRequest（包含标题、描述和候选选项）。
- 调用format_input生成提示，tokenize后输入模型。
- 使用model.generate生成预测（最大256个新token，无随机采样）。
- 解码生成文本，调用parse_model_output提取分类结果。
- 对每个字段：
  - 如果有候选选项，使用get_best_choice匹配最接近的选项，置信度乘以0.9。
  - 如果无候选选项，直接使用模型预测值，置信度设为0.9。
- 返回JSON结果，包含predictions（分类结果）和confidences（置信度）。

覆盖测试

def load_test_data(file_path: str) -> List[Dict]:"""Load test data from JSON file with UTF-8 encoding and shuffle dictionary keys."""try:with open(file_path, 'r', encoding='utf-8') as f:data = json.load(f)if not isinstance(data, list):raise ValueError("Test file must contain a list of test cases")# Shuffle keys in each dictionary to achieve disordered loadingshuffled_data = []for item in data:if not isinstance(item, dict):raise ValueError("Each test case must be a dictionary")# Get keys and shuffle themkeys = list(item.keys())random.shuffle(keys)# Create a new dictionary with shuffled key ordershuffled_item = {key: item[key] for key in keys}shuffled_data.append(shuffled_item)return shuffled_dataexcept UnicodeDecodeError as e:print(f"Encoding error in test file: {e}")# Attempt to read file content to show problematic linetry:with open(file_path, 'rb') as f:lines = f.readlines()error_position = e.startfor i, line in enumerate(lines):if error_position >= len(line):error_position -= len(line)else:print(f"Problematic line {i + 1}: {line.decode('utf-8', errors='replace')}")breakexcept Exception as ex:print(f"Could not read file for debugging: {ex}")sys.exit(1)except Exception as e:print(f"Error loading test file: {e}")sys.exit(1)def make_api_request(test_case: Dict) -> Dict:"""Send test case to API using requests and return predictions."""# Prepare API request payloadpayload = {"input": {"title": test_case["title"],"description": test_case["description"]},"choices": {"ticket_type": ["Service Request","Problem","Incident"],"issue_type": [...],"subissue_type": [...],"priority_name": ["Low","High","Critical","Medium"],"queue_id_name": [...],"billing_codes": [...]}}try:# Make POST requestresponse = requests.post(API_URL, headers=HEADERS, json=payload, timeout=10)response.raise_for_status()  # Raise exception for bad status codesreturn response.json()["predictions"]except requests.exceptions.RequestException as e:print(f"API request failed: {e}")return Noneexcept (KeyError, ValueError) as e:print(f"Invalid response format: {e}")return Nonedef calculate_accuracy(predictions: List[Dict], ground_truth: List[Dict]) -> Dict[str, float]:"""Calculate accuracy for each field."""fields = ["ticket_type", "issue_type", "subissue_type", "priority_name", "queue_id_name", "billing_codes"]correct_counts = {field: 0 for field in fields}total_counts = {field: 0 for field in fields}for pred, truth in zip(predictions, ground_truth):for field in fields:if field in pred and field in truth:total_counts[field] += 1if pred[field] == truth[field]:correct_counts[field] += 1return {field: (correct_counts[field] / total_counts[field] * 100) if total_counts[field] > 0 else 0.0for field in fields}def main(test_file: str):"""Run coverage test and report progress and accuracy."""# Load test datatest_cases = load_test_data(test_file)total_cases = len(test_cases)if total_cases == 0:print("No test cases found in the file.")returnpredictions = []print(f"Starting coverage test with {total_cases} test cases...")# Process each test case with tqdm progress barfor i, test_case in enumerate(tqdm(test_cases, desc="Processing test cases", unit="case")):prediction = make_api_request(test_case)if prediction:predictions.append(prediction)else:predictions.append({})# Report intermediate accuracy every 10 test cases or at the endif (i + 1) % 10 == 0 or (i + 1) == total_cases:accuracy = calculate_accuracy(predictions, test_cases[:i + 1])print("\nIntermediate Accuracy Report:")for field, acc in accuracy.items():print(f"{field}: {acc:.2f}%")print()# Final accuracy reportaccuracy = calculate_accuracy(predictions, test_cases)print("\nFinal Accuracy Report:")for field, acc in accuracy.items():print(f"{field}: {acc:.2f}%")

随机抽取进行测试，在跨租户数据上进行测试，使用没有新租户的数据进行微调

Final Accuracy Report:
ticket_type: 85.00%
issue_type: 81.00%
subissue_type: 9.00%
priority_name: 72.00%
queue_id_name: 2.00%
billing_codes: 95.00%

可以看到某些字段如queue_id_name，由于租户和租户之间的数据差异太大简单的使用在基础数据上做相似度匹配方式做预测已经不管用了

尽管已经有了单独的微调脚本和评估脚本，但开发一个集成化的系统，允许用户随时上传文件进行微调和评估，仍然具有重要意义。以下是原因的简洁说明：

简化操作流程：单独的微调和评估脚本需要用户在命令行或不同环境中手动运行，涉及文件路径配置、参数调整和结果查看，操作复杂且易出错。集成系统提供一个直观的界面（如Gradio的Web界面），让用户只需上传JSON文件、设置参数并点击按钮，就能完成微调和评估，极大降低使用门槛。
实时反馈与监控：微调脚本运行时，用户往往只能通过日志文件或控制台查看进度，缺乏实时交互。集成系统通过实时显示训练进度（如损失值、步数）
多租户支持与快速迭代：单独脚本每次运行通常针对单一数据集，难以快速适配多租户场景。集成系统允许用户随时上传不同租户的JSON文件，快速微调模型并生成检查点（checkpoint），支持动态切换租户数据，满足实时业务需求

应用集成

全量代码

import gradio as gr
import json
import os
import torch
import uvicorn
from datasets import Dataset
from unsloth import FastLanguageModel, is_bfloat16_supported
from transformers import TrainingArguments
from trl import SFTTrainer
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer, util
from typing import Dict, List
import numpy as np
import subprocess
import threading
import requests
import time
import logging
import shutil
import glob
import psutil
import socket# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)# FastAPI app for predictions
app = FastAPI()# Global variables for model and tokenizer
global_model = None
global_tokenizer = None
global_similarity_model = None
max_seq_length = 8096
load_in_4bit = True
server_thread = None  # Track the FastAPI server thread# Define input schema for FastAPI
class TicketInput(BaseModel):title: strdescription: strclass Choices(BaseModel):ticket_type: List[str] = Noneissue_type: List[str] = Nonesubissue_type: List[str] = Nonepriority_name: List[str] = Nonequeue_id_name: List[str] = Nonebilling_codes: List[str] = Noneclass PredictionRequest(BaseModel):input: TicketInputchoices: Choices# System prompt for predictions
system_prompt = """
You are an intelligent Ticket Triage AI Agent designed to assist Managed Service Providers (MSPs) in efficiently managing and prioritizing support tickets. Your goal is to improve ticket handling speed, accuracy, and overall client satisfaction.**Core Responsibilities**:
- Analyze the ticket's **title** and **description** to understand user intent, system context, and impact scope.
- Prioritize technical accuracy, business logic, and enterprise IT norms over superficial keyword matching.
- Avoid speculation, hallucination, or subjective interpretation. Resolve ambiguity by focusing on actionable intent and system behavior.
- Output **exactly one** classification for requested fields in the format (eg:ticket_type: <ticket_type_value>)
"""def format_input(title: str, description: str) -> str:return f"""
<|startoftext|>System: {system_prompt}User: Title: {title}
Description: {description}Predict the following fields:
- ticket_type
- issue_type
- subissue_type
- priority_name
- queue_id_name
- billing_codesAssistant: """def parse_output(output):assistant_response = output.split("Assistant: ")[1].strip()predictions = {}clean_key = Nonefor line in assistant_response.split("\n"):if line.strip():if ':' in line:key, value = line.split(':', 1)clean_key = key.split('. ', 1)[-1].strip()predictions[clean_key] = value.strip()elif line.startswith('- '):value = line.split('- ', 1)[-1].strip()if clean_key:predictions[clean_key] = valueelse:key_parts = line.split('. ')if len(key_parts) > 1:clean_key = key_parts[1].strip()return predictionsdef parse_model_output(output: str) -> Dict[str, str]:try:predictions = parse_output(output)logger.info(f"Parsed predictions: {predictions}")expected_keys = {"ticket_type", "issue_type", "subissue_type", "priority_name", "queue_id_name","billing_codes"}if not all(key in predictions for key in expected_keys):raise ValueError("Missing expected keys in model output")return predictionsexcept Exception as e:raise ValueError(f"Error parsing model output: {e}")def get_best_choice(predicted: str, choices: List[str], similarity_model) -> tuple[str, float]:if not choices:return predicted, Nonepredicted_embedding = similarity_model.encode(predicted, convert_to_tensor=True)choice_embeddings = similarity_model.encode(choices, convert_to_tensor=True)similarities = util.cos_sim(predicted_embedding, choice_embeddings)[0]best_idx = similarities.argmax().item()confidence = similarities[best_idx].item() * 100  # Scale to 0-100confidence = min(max(confidence, 0.0), 100.0)print(f'predicted:{predicted}, best choice:{choices[best_idx]}')return choices[best_idx], round(confidence, 5)@app.post("/predict")
async def predict(request: PredictionRequest):try:prompt = format_input(request.input.title, request.input.description)inputs = global_tokenizer(prompt, return_tensors="pt", truncation=True, max_length=8096).to("cuda")with torch.no_grad():outputs = global_model.generate(**inputs, max_new_tokens=256, do_sample=False)generated_text = global_tokenizer.decode(outputs[0], skip_special_tokens=True)logger.info(f"Generated text: {generated_text}")predictions = parse_model_output(generated_text)result = {"predictions": {}, "confidences": {}}choices_dict = request.choices.dict(exclude_none=True)for field in predictions:if field in choices_dict and choices_dict[field]:predicted_value, confidence = get_best_choice(predictions[field], choices_dict[field],global_similarity_model)result["predictions"][field] = predicted_valueresult["confidences"][field] = confidence * 0.9 if confidence is not None else 0.0else:result["predictions"][field] = predictions[field]result["confidences"][field] = 0.9return resultexcept ValueError as e:raise HTTPException(status_code=500, detail=str(e))except Exception as e:raise HTTPException(status_code=500, detail=f"Unexpected error: {str(e)}")# Fine-tuning functions
def load_and_format_data(path):def format_example(sample):user_msg = f"""
Title: {sample['title']}
Description: {sample['description']}Predict the following fields:
- ticket_type
- issue_type
- subissue_type
- priority_name
- queue_id_name
- billing_codes
"""assistant_msg = f"""
ticket_type: {sample['ticket_type']}
issue_type: {sample['issue_type']}
subissue_type: {sample['subissue_type']}
priority_name: {sample['priority_name']}
queue_id_name: {sample['queue_id_name']}
billing_codes: {sample['billing_codes']}
"""return {"input": user_msg.strip(), "output": assistant_msg.strip()}with open(path, 'r', encoding='utf-8') as f:data = json.load(f)formatted = [format_example(x) for x in data]return Dataset.from_list(formatted)def load_or_create_peft_model(model_path_or_checkpoint: str,max_seq_length: int = 2048,dtype=None,load_in_4bit: bool = False,lora_config: dict = None,
):assert os.path.exists(model_path_or_checkpoint), f"路径不存在：{model_path_or_checkpoint}"logger.info(f"Loading model from {model_path_or_checkpoint}")try:if torch.cuda.is_available():torch.cuda.empty_cache()model, tokenizer = FastLanguageModel.from_pretrained(model_name=model_path_or_checkpoint,max_seq_length=max_seq_length,dtype=dtype,load_in_4bit=load_in_4bit,)logger.info(f"Model loaded, parameters: {sum(p.numel() for p in model.parameters())}")except Exception as e:logger.error(f"Failed to load model: {e}")raiseadapter_config_path = os.path.join(model_path_or_checkpoint, "adapter_config.json")if os.path.exists(adapter_config_path):logger.info(f"检测到 LoRA adapter: {adapter_config_path}")else:logger.info("未检测到 LoRA adapter，初始化新 LoRA...")assert lora_config is not None, "首次训练需提供 lora_config"try:model = FastLanguageModel.get_peft_model(model,r=lora_config.get("r", 16),target_modules=lora_config.get("target_modules", ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"]),lora_alpha=lora_config.get("lora_alpha", 32),lora_dropout=lora_config.get("lora_dropout", 0),bias=lora_config.get("bias", "none"),use_gradient_checkpointing=lora_config.get("use_gradient_checkpointing", "unsloth"),random_state=lora_config.get("random_state", 42),use_rslora=lora_config.get("use_rslora", False),loftq_config=lora_config.get("loftq_config", None),)logger.info(f"LoRA adapter initialized, parameters: {sum(p.numel() for p in model.parameters())}")except Exception as e:logger.error(f"Failed to initialize LoRA: {e}")raisereturn model, tokenizerdef train_model(json_file, model_path, batch_size, grad_steps, learning_rate, max_steps):try:# Initialize progress messagesprogress_messages = []# Extract filename for checkpoint directoryjson_filename = os.path.basename(json_file.name)output_dir = os.path.join("outputs", os.path.splitext(json_filename)[0])os.makedirs(output_dir, exist_ok=True)# Load datasetdataset = load_and_format_data(json_file.name)progress_messages.append(f"Dataset loaded, size: {len(dataset)}\n")yield "\n".join(progress_messages), []# Model configurationmax_seq_length = 2048lora_rank = 16dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16load_in_4bit = Truelora_config = {"r": lora_rank,"lora_alpha": lora_rank * 2,"lora_dropout": 0,"target_modules": ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"],"use_gradient_checkpointing": "unsloth","bias": "none","use_rslora": False,"loftq_config": None}# Load modelmodel, tokenizer = load_or_create_peft_model(model_path_or_checkpoint=model_path,max_seq_length=max_seq_length,dtype=dtype,load_in_4bit=load_in_4bit,lora_config=lora_config,)progress_messages.append("Model and tokenizer loaded successfully.\n")yield "\n".join(progress_messages), []# Formatting function for promptsdef formatting_prompts_func(examples):system_prompt = """
You are an intelligent Ticket Triage AI Agent designed to assist Managed Service Providers (MSPs) in efficiently managing and prioritizing support tickets. Your goal is to improve ticket handling speed, accuracy, and overall client satisfaction.**Core Responsibilities**:
- Analyze the ticket's **title** and **description** to understand user intent, system context, and impact scope.
- Prioritize technical accuracy, business logic, and enterprise IT norms over superficial keyword matching.
- Avoid speculation, hallucination, or subjective interpretation. Resolve ambiguity by focusing on actionable intent and system behavior.
- Output **exactly one** classification for requested fields in the format (eg, ticket_type: <ticket_type_value>)
"""texts = []for input_text, output_text in zip(examples["input"], examples["output"]):messages = [{"role": "system", "content": system_prompt},{"role": "user", "content": input_text},{"role": "assistant", "content": output_text}]prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)texts.append(prompt)return texts# Configure trainertrainer = SFTTrainer(model=model,tokenizer=tokenizer,train_dataset=dataset,dataset_text_field="text",formatting_func=formatting_prompts_func,max_seq_length=max_seq_length,dataset_num_proc=2,args=TrainingArguments(per_device_train_batch_size=batch_size,gradient_accumulation_steps=grad_steps,learning_rate=learning_rate,lr_scheduler_type="linear",num_train_epochs=1,warmup_ratio=0.03,logging_steps=50,save_steps=200,save_total_limit=1,fp16=not is_bfloat16_supported(),bf16=is_bfloat16_supported(),optim="adamw_8bit",weight_decay=0.01,seed=3407,max_steps=max_steps,output_dir=output_dir,resume_from_checkpoint=True,report_to="none",dataloader_num_workers=1,))# Train and update progressdef log_training_progress(trainer, progress_queue):trainer.train()progress_queue.append("Training completed successfully!\n")# Use a list as a simple queue for thread-safe progress updatesprogress_queue = []training_thread = threading.Thread(target=log_training_progress, args=(trainer, progress_queue))training_thread.start()# Monitor training progresslog_file = os.path.join(output_dir, "trainer_log.jsonl")last_step = 0while training_thread.is_alive():if os.path.exists(log_file):with open(log_file, 'r') as f:lines = f.readlines()if lines:last_line = lines[-1]try:log_entry = json.loads(last_line)current_step = log_entry.get('global_step', last_step)if current_step > last_step:loss = log_entry.get('loss', 'N/A')progress_messages.append(f"Step {current_step}/{max_steps}, Loss: {loss}\n")last_step = current_stepyield "\n".join(progress_messages), []except json.JSONDecodeError:passtime.sleep(2)training_thread.join()# Append any final messages from the training threadprogress_messages.extend(progress_queue)# Return checkpoint pathscheckpoints = glob.glob(os.path.join(output_dir, "checkpoint-*"))progress_messages.append(f"Training completed. Checkpoints saved at: {output_dir}\n")yield "\n".join(progress_messages), checkpointsexcept Exception as e:progress_messages.append(f"Training failed: {str(e)}\n")yield "\n".join(progress_messages), []def check_and_free_port(port):"""Check if the port is in use and terminate any process using it."""progress_messages = []try:# Check if port is in usefor conn in psutil.net_connections():if conn.laddr.port == port and conn.status == psutil.CONN_LISTEN:pid = conn.pidif pid:progress_messages.append(f"Found process {pid} using port {port}. Terminating...\n")try:p = psutil.Process(pid)p.terminate()p.wait(timeout=3)  # Wait for process to terminateprogress_messages.append(f"Process {pid} terminated.\n")except psutil.NoSuchProcess:progress_messages.append(f"Process {pid} no longer exists.\n")except psutil.TimeoutExpired:p.kill()  # Force kill if it doesn't terminateprogress_messages.append(f"Process {pid} forcefully terminated.\n")except Exception as e:progress_messages.append(f"Failed to terminate process {pid}: {str(e)}\n")# Verify port is freewith socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)try:s.bind(("0.0.0.0", port))progress_messages.append(f"Port {port} is free.\n")except socket.error:progress_messages.append(f"Port {port} still in use, may require manual intervention.\n")except Exception as e:progress_messages.append(f"Error checking/freeing port {port}: {str(e)}\n")return progress_messagesdef start_fastapi(checkpoint_path):global global_model, global_tokenizer, global_similarity_model, server_threadprogress_messages = []try:# Free port 8000 if in useprogress_messages.extend(check_and_free_port(8000))# Load model and tokenizerglobal_model, global_tokenizer = FastLanguageModel.from_pretrained(model_name=checkpoint_path,max_seq_length=max_seq_length,dtype=None,load_in_4bit=load_in_4bit,)global_model.eval()FastLanguageModel.for_inference(global_model)global_similarity_model = SentenceTransformer('all-MiniLM-L6-v2')progress_messages.append(f"Model loaded from {checkpoint_path}\n")# Start FastAPI server in a separate threaddef run_server():uvicorn.run(app, host="0.0.0.0", port=8000, log_level="error")# Terminate existing server thread if it existsif server_thread and server_thread.is_alive():progress_messages.append("Existing FastAPI server thread detected. Attempting to stop...\n")# Note: Thread termination is tricky in Python; rely on port freeing insteadserver_thread = Noneserver_thread = threading.Thread(target=run_server)server_thread.daemon = Trueserver_thread.start()time.sleep(2)  # Wait for server to start# Verify server is runningtry:response = requests.get("http://localhost:8000", timeout=2)if response.status_code == 200:progress_messages.append("FastAPI server started at http://0.0.0.0:8000\n")else:progress_messages.append("FastAPI server started but returned unexpected status.\n")except requests.ConnectionError:progress_messages.append("FastAPI server failed to start (connection error).\n")return "\n".join(progress_messages)except Exception as e:progress_messages.append(f"Failed to start FastAPI server: {str(e)}\n")return "\n".join(progress_messages)def make_prediction(title, description, ticket_type_choices, issue_type_choices, subissue_type_choices,priority_name_choices, queue_id_name_choices, billing_codes_choices, progress_output):try:# Initialize progress messagesprogress_messages = [progress_output] if progress_output else []choices_dict = {"ticket_type": ticket_type_choices.split(",") if ticket_type_choices else None,"issue_type": issue_type_choices.split(",") if issue_type_choices else None,"subissue_type": subissue_type_choices.split(",") if subissue_type_choices else None,"priority_name": priority_name_choices.split(",") if priority_name_choices else None,"queue_id_name": queue_id_name_choices.split(",") if queue_id_name_choices else None,"billing_codes": billing_codes_choices.split(",") if billing_codes_choices else None,}request_data = {"input": {"title": title,"description": description},"choices": choices_dict}response = requests.post("http://localhost:8000/predict", json=request_data)response.raise_for_status()result = response.json()progress_messages.append(f"Prediction result: {json.dumps(result, indent=2)}\n")return "\n".join(progress_messages)except Exception as e:progress_messages.append(f"Prediction failed: {str(e)}\n")return "\n".join(progress_messages)# Gradio interface
def create_gradio_app():with gr.Blocks() as demo:gr.Markdown("# Ticket Triage AI Fine-Tuning and Prediction")with gr.Row():# Left Column: Trainingwith gr.Column():gr.Markdown("## Fine-Tune Model")json_file = gr.File(label="Upload JSON File")model_path = gr.Textbox(label="Model Path", value="/opt/chenrui/qwq32b/base_model/qwen2-7b")batch_size = gr.Slider(minimum=1, maximum=32, value=16, step=1, label="Per Device Train Batch Size")grad_steps = gr.Slider(minimum=1, maximum=10, value=5, step=1, label="Gradient Accumulation Steps")learning_rate = gr.Slider(minimum=1e-5, maximum=1e-3, value=2e-4, step=1e-5, label="Learning Rate")max_steps = gr.Slider(minimum=100, maximum=5000, value=100, step=100, label="Max Steps")train_button = gr.Button("Start Training")training_progress = gr.Textbox(label="Training Progress", lines=10, interactive=False)# Right Column: Deployment and Predictionwith gr.Column():gr.Markdown("## Deploy and Predict")checkpoint_dropdown = gr.Dropdown(label="Select Checkpoint", choices=[])refresh_checkpoints = gr.Button("Refresh Checkpoints")start_fastapi_button = gr.Button("Start FastAPI Server")fastapi_status = gr.Textbox(label="FastAPI Status", interactive=False)gr.Markdown("### Make Prediction")title_input = gr.Textbox(label="Ticket Title")description_input = gr.Textbox(label="Ticket Description", lines=5)ticket_type_choices = gr.Textbox(label="Ticket Type Choices (comma-separated)",placeholder="e.g., Incident,Request")issue_type_choices = gr.Textbox(label="Issue Type Choices (comma-separated)",placeholder="e.g., Hardware,Software")subissue_type_choices = gr.Textbox(label="Subissue Type Choices (comma-separated)",placeholder="e.g., Network,Application")priority_name_choices = gr.Textbox(label="Priority Name Choices (comma-separated)",placeholder="e.g., High,Medium,Low")queue_id_name_choices = gr.Textbox(label="Queue ID Name Choices (comma-separated)",placeholder="e.g., IT,Support")billing_codes_choices = gr.Textbox(label="Billing Codes Choices (comma-separated)",placeholder="e.g., BILL001,BILL002")predict_button = gr.Button("Make Prediction")prediction_output = gr.Textbox(label="Prediction Output", lines=10, interactive=False)# Event handlersdef update_checkpoints():output_dir = "outputs"checkpoints = []if os.path.exists(output_dir):# Scan all subdirectories in outputs for checkpoint-* foldersfor root, dirs, _ in os.walk(output_dir):for dir_name in dirs:if dir_name.startswith("checkpoint-"):checkpoints.append(os.path.join(root, dir_name))checkpoints.sort()  # Sort for consistent displayreturn gr.Dropdown(choices=checkpoints, value=checkpoints[0] if checkpoints else None)train_button.click(fn=train_model,inputs=[json_file, model_path, batch_size, grad_steps, learning_rate, max_steps],outputs=[training_progress, checkpoint_dropdown])refresh_checkpoints.click(fn=update_checkpoints,inputs=[],outputs=[checkpoint_dropdown])start_fastapi_button.click(fn=start_fastapi,inputs=[checkpoint_dropdown],outputs=[fastapi_status])predict_button.click(fn=make_prediction,inputs=[title_input, description_input, ticket_type_choices, issue_type_choices,subissue_type_choices, priority_name_choices, queue_id_name_choices,billing_codes_choices, prediction_output],outputs=[prediction_output])return demoif __name__ == "__main__":demo = create_gradio_app()demo.launch(server_name="0.0.0.0")

上传特定格式的json 格式文件开始训练