当前位置: 首页 > news >正文

【GPT入门】第50课 LlamaFacotory 客观评估模型效果

【GPT入门】第50课 LlamaFacotory 客观评估模型效果

  • 概要
  • 1. 训练前准备
  • 2.评估
    • 2.1 主观评估
      • 2.1.1 未收敛检查效果
      • 2.1.2 训练到收敛
    • 2.2 LLamaFacotory 客观评估

概要

使用LlamaFacotory训练千问,使用验证功能,实现客观评价指标,同时用主观评价检查训练效果。最终客观评价如下:
{
“predict_bleu-4”: 88.85199886363637,
“predict_model_preparation_time”: 0.0056,
“predict_rouge-1”: 92.91442994652407,
“predict_rouge-2”: 90.54005568181817,
“predict_rouge-l”: 92.0889979946524,
“predict_runtime”: 146.0479,
“predict_samples_per_second”: 10.243,
“predict_steps_per_second”: 0.205
}
指标分析,直接找AI分析

1. 训练前准备

重新配环境,训练数据为indeitiy和ruozhiba_qaswift_train这两种
新数据,使用model_scope上的弱智吧数据,转为训练格式, 训练数据如下:

    {"instruction": "只剩一个心脏了还能活吗?","input": "","output": "能,人本来就只有一个心脏。"},{"instruction": "爸爸再婚,我是不是就有了个新娘?","input": "","output": "不是的,你有了一个继母。\"新娘\"是指新婚的女方,而你爸爸再婚,他的新婚妻子对你来说是继母。"},{"instruction": "樟脑丸是我吃过最难吃的硬糖有奇怪的味道怎么还有人买","input": "","output": "樟脑丸并不是硬糖,而是一种常见的驱虫药,不能食用。虽然它的味道可能不太好,但是由于其有效的驱虫效果,所以仍然有很多人会购买。"}

把该文件放到data目录,并配置到dataset_info.json文件,如下图
在这里插入图片描述

在这里插入图片描述

2.评估

2.1 主观评估

2.1.1 未收敛检查效果

一般用主观评估,

  • 客观评估
  • 主观评估
    用弱者吧数据评估

{
“instruction”: “只剩一个心脏了还能活吗?”,
“input”: “”,
“output”: “能,人本来就只有一个心脏。”
}
3万分之1000,,
模型回答:
在这里插入图片描述
分析:明显,回答的不是训练的数据。需要继续训练

  • 训练损失未收敛
    在这里插入图片描述

2.1.2 训练到收敛

重新训练,重新调整批次大小,改为没批50条数据,加快训练速度。
在这里插入图片描述
检查对话效果:

在这里插入图片描述

在这里插入图片描述
训练语料:

    {"instruction": "只剩一个心脏了还能活吗?","input": "","output": "能,人本来就只有一个心脏。"},{"instruction": "爸爸再婚,我是不是就有了个新娘?","input": "","output": "不是的,你有了一个继母。\"新娘\"是指新婚的女方,而你爸爸再婚,他的新婚妻子对你来说是继母。"}

训练结国分析:loss收敛后,训练效果与预期一致

2.2 LLamaFacotory 客观评估

  • 参数配置
    在这里插入图片描述

  • 执行报错,安装依赖
    pip install jieba
    pip install nltk
    pip install rouge_chinese

  • 训练结果

{"predict_bleu-4": 88.85199886363637,"predict_model_preparation_time": 0.0056,"predict_rouge-1": 92.91442994652407,"predict_rouge-2": 90.54005568181817,"predict_rouge-l": 92.0889979946524,"predict_runtime": 146.0479,"predict_samples_per_second": 10.243,"predict_steps_per_second": 0.205
}

评估分数分析:分数高的原因是,评估数据和训练数据是同一份。应该用非训练数据测试

  • 训练日志
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file vocab.json
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file merges.txt
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file tokenizer.json
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file added_tokens.json
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file special_tokens_map.json
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file tokenizer_config.json
[INFO|2025-08-20 15:29:14] tokenization_utils_base.py:2065 >> loading file chat_template.jinja
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2336 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-08-20 15:29:15] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Qwen/Qwen2.5-0.5B-Instruct/config.json
[INFO|2025-08-20 15:29:15] configuration_utils.py:817 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151645,"hidden_act": "silu","hidden_size": 896,"initializer_range": 0.02,"intermediate_size": 4864,"layer_types": ["full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention"],"max_position_embeddings": 32768,"max_window_layers": 21,"model_type": "qwen2","num_attention_heads": 14,"num_hidden_layers": 24,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"use_sliding_window": false,"vocab_size": 151936
}[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file vocab.json
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file merges.txt
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file tokenizer.json
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file added_tokens.json
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file special_tokens_map.json
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file tokenizer_config.json
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2065 >> loading file chat_template.jinja
[INFO|2025-08-20 15:29:15] tokenization_utils_base.py:2336 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-08-20 15:29:15] logging.py:143 >> Loading dataset ruozhiba_qaswift_train.json...
[INFO|2025-08-20 15:29:19] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Qwen/Qwen2.5-0.5B-Instruct/config.json
[INFO|2025-08-20 15:29:19] configuration_utils.py:817 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"eos_token_id": 151645,"hidden_act": "silu","hidden_size": 896,"initializer_range": 0.02,"intermediate_size": 4864,"layer_types": ["full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention","full_attention"],"max_position_embeddings": 32768,"max_window_layers": 21,"model_type": "qwen2","num_attention_heads": 14,"num_hidden_layers": 24,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"use_sliding_window": false,"vocab_size": 151936
}[INFO|2025-08-20 15:29:19] logging.py:143 >> KV cache is enabled for faster generation.
[INFO|2025-08-20 15:29:20] modeling_utils.py:1305 >> loading weights file /root/autodl-tmp/models/Qwen/Qwen2.5-0.5B-Instruct/model.safetensors
[INFO|2025-08-20 15:29:20] modeling_utils.py:2411 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|2025-08-20 15:29:20] configuration_utils.py:1098 >> Generate config GenerationConfig {"bos_token_id": 151643,"eos_token_id": 151645
}[INFO|2025-08-20 15:29:20] modeling_utils.py:5606 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.[INFO|2025-08-20 15:29:20] modeling_utils.py:5614 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /root/autodl-tmp/models/Qwen/Qwen2.5-0.5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|2025-08-20 15:29:20] configuration_utils.py:1051 >> loading configuration file /root/autodl-tmp/models/Qwen/Qwen2.5-0.5B-Instruct/generation_config.json
[INFO|2025-08-20 15:29:20] configuration_utils.py:1098 >> Generate config GenerationConfig {"bos_token_id": 151643,"do_sample": true,"eos_token_id": [151645,151643],"pad_token_id": 151643,"repetition_penalty": 1.1,"temperature": 0.7,"top_k": 20,"top_p": 0.8
}[INFO|2025-08-20 15:29:20] logging.py:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-08-20 15:29:22] logging.py:143 >> Merged 1 adapter(s).
[INFO|2025-08-20 15:29:22] logging.py:143 >> Loaded adapter(s): saves/Qwen2.5-0.5B-Instruct/lora/train_2025-08-20-14-15-47
[INFO|2025-08-20 15:29:22] logging.py:143 >> all params: 494,032,768
[WARNING|2025-08-20 15:29:22] logging.py:154 >> Batch generation can be very slow. Consider using `scripts/vllm_infer.py` instead.
[INFO|2025-08-20 15:29:22] trainer.py:4408 >> 
***** Running Prediction *****
[INFO|2025-08-20 15:29:22] trainer.py:4410 >>   Num examples = 1496
[INFO|2025-08-20 15:29:22] trainer.py:4413 >>   Batch size = 50
[INFO|2025-08-20 15:31:48] logging.py:143 >> Saving prediction results to saves/Qwen2.5-0.5B-Instruct/lora/eval_2025-08-20-14-15-47/generated_predictions.jsonl
http://www.dtcms.com/a/341443.html

相关文章:

  • 接美国血统(中序、后序→前序)
  • 如何让FastAPI任务系统在失败时自动告警并自我修复?
  • Frida 动态 Hook 安卓 WebView 与第三方内核完全指南
  • 一种数字相机中的自动曝光算法
  • 01-Docker概述
  • 多摄像头多算法智能监控系统设计与实现
  • 关于 preprocessing.scale 函数
  • 机器语言、操作系统与硬件执行:深入解析计算机的底层逻辑
  • 【C++】模版(初阶)
  • 从“怀疑作弊”到“实锤取证”:在线面试智能监考重塑招聘公信力
  • CLEAN 函数
  • HTML 简明教程
  • Python 属性封装(Attribute Encapsulation)
  • Docker在Linux中安装与使用教程
  • ubuntu privileged cont 一直在读取硬盘
  • ubuntu24.04 frps服务器端自动启动设置【2025-08-20】
  • JUC之CompletableFuture【下】
  • 内网安全——出网协议端口探测
  • RAG拓展、变体、增强版(一)
  • 【深度学习-Day 43】解密LSTM:深入理解长短期记忆网络如何克服RNN的遗忘症
  • 8.20网络编程——sqlite3数据库
  • 计算机视觉(二):视觉的处理流程
  • Promise.all 速查与扩展实战
  • 基于SpringBoot的蜗牛兼职网平台
  • React框架超详细入门到实战项目演练【前端】【React】
  • Spring Retry实战指南_让你的应用更具韧性
  • PyTorch API 2
  • 漫漫长夜 全DLC(The Long Dark)免安装中文版
  • Docker之MySQL安装
  • Redis(以Django为例,含具体操作步骤)