当前位置: 首页 > news >正文

【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B

  • 本地环境说明
  • 禁用开源驱动nouveau
  • 安装nvidia-smi
  • 安装Git环境
  • 安装Anaconda(conda)环境
  • 下载`DeepSeek-R1-Distill-Qwen-7B`模型
  • 安装LLaMA-Factory
  • 下载LLaMA-Factory
  • 安装LLaMA-Factory依赖
  • 修改环境变量
  • 安装deepspeed
  • Alpaca数据集准备
  • lora配置文件准备
  • 设置GPU个数
  • 启动微调
  • 查看微调时GPU情况
  • 启动webui服务
  • 对话
  • 损失函数曲线
  • 参考资料

本地环境说明

依赖版本
LinuxBigCloud Enterprise Linux 8.6
GPUNVIDIA Tesla T4 16G * 8

禁用开源驱动nouveau

如果不禁用开源驱动,直接安装nvidia-smi,会安装失败,在日志文件/var/log/nvidia-installer.log中会出现以下错误信息
ERROR: Unable to load the kernel module 'nvidia.ko'

  • 查看nouveau是否在运行,先输入指令
lsmod | grep nouveau

如果不出现一下的情况则已经禁用

nouveau              2334720  0
video                  57344  1 nouveau
mxm_wmi                16384  1 nouveau
drm_kms_helper        262144  5 drm_vram_helper,ast,nouveau
ttm                   114688  3 drm_vram_helper,drm_ttm_helper,nouveau
i2c_algo_bit           16384  3 igb,ast,nouveau
drm                   614400  7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,ttm,nouveau
i2c_core               98304  9 drm_kms_helper,i2c_algo_bit,igb,ast,i2c_smbus,i2c_i801,ipmi_ssif,nouveau,drm
wmi                    32768  2 mxm_wmi,nouveau
  • 禁用nouveau
# 如果文件不存在,就创建
sudo sh -c 'cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF'# 重新生成 initramfs
sudo dracut --force
# 重启机器
sudo reboot

安装nvidia-smi

  • 浏览器访问: https://www.nvidia.com/en-us/drivers/

    需要梯子(科学上网),才能加载Manual Driver Search

  • 填写驱动信息
    驱动查询页面
  • 驱动查询结果
    驱动查询结果
  • 驱动下载页面
    驱动下载页面
  • 下载驱动文件,得到NVIDIA-Linux-x86_64-570.133.20.run,上传到Linux机器上

    不能复制下载地址,然后在机器上使用wget命令直接下载,这样请求会返回403

  • 安装驱动
  • 必须使用root权限安装
  • -no-x-check #安装驱动时关闭X服务
  • -no-nouveau-check #安装驱动时禁用nouveau
  • -no-opengl-files #只安装驱动文件,不安装OpenGL文件
  • 安装的时候,出现内核模块类型选择,根据提示选择NVIDIA Proprietary,使用左右键控制选择,然后回车
Multiple kernel module types are available for this system. Which would you like to use?NVIDIA Proprietary        MIT/GPL   
  • 安装完运行命令确认驱动安装成功
nvidia-smi

显卡信息如下

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   54C    P0             28W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   48C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   47C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   53C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   56C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   55C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

安装Git环境

安装gitgit-lfs

sudo dnf install git git-lfs

安装Anaconda(conda)环境

  • 下载页面: https://www.anaconda.com/download/success
    anaconda下载页面
  • 64-Bit (x86) Installer下载
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
  • 安装
sudo sh Anaconda3-2024.10-1-Linux-x86_64.sh
  • 会出现很多信息,一路yes下去,观看文档用q跳过
Do you accept the license terms? [yes|no]
>>> yesAnaconda3 will now be installed into this location:
/root/anaconda3- Press ENTER to confirm the location- Press CTRL-C to abort the installation- Or specify a different location below
[/root/anaconda3] >>> /data/ProgramFiles/anaconda3You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
  • 设置环境变量
cat >> ~/.bash_profile << EOF
export ANACONDA3_HOME=/data/ProgramFiles/anaconda3
export CONDA_ENVS_PATH=\$ANACONDA3_HOME/envs
export PATH="\$ANACONDA3_HOME/bin:$PATH"
EOF
source ~/.bash_profile# 目录是使用root权限安装的,对目录进行授权
sudo chown -R tkyj.tkyj /data
  • 查看conda版本以验证是否安装成功
conda -V
  • 配置镜像源
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

下载DeepSeek-R1-Distill-Qwen-7B模型

  • 魔搭社区: https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
mkdir -pv /data/llm/models
cd /data/llm/models# 如果您希望跳过 lfs 大文件下载,可以使用如下命令
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
# 请确保lfs已经被正确安装
cd DeepSeek-R1-Distill-Qwen-7B
git lfs install
# 下载大文件
git lfs pull

安装LLaMA-Factory

  • 软件要求
MandatoryMinimumRecommend
python3.93.10
torch2.0.02.6.0
transformers4.45.04.50.0
datasets2.16.03.2.0
accelerate0.34.01.2.1
peft0.14.00.15.1
trl0.8.60.9.6
OptionalMinimumRecommend
CUDA11.612.2
deepspeed0.10.00.16.4
bitsandbytes0.39.00.43.1
vllm0.4.30.8.2
flash-attn2.5.62.7.2
  • 硬件要求
MethodBits7B14B30B70BxB
Full (bf16 or fp16)32120GB240GB600GB1200GB18xGB
Full (pure_bf16)1660GB120GB300GB600GB8xGB
Freeze/LoRA/GaLore/APOLLO/BAdam1616GB32GB64GB160GB2xGB
QLoRA810GB20GB40GB80GBxGB
QLoRA46GB12GB24GB48GBx/2GB
QLoRA24GB8GB16GB24GBx/4GB

下载LLaMA-Factory

  • 使用git克隆项目
cd /data/ProgramFiles
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

安装LLaMA-Factory依赖

cd LLaMA-Factory
conda create --name llama_factory  python=3.10
conda activate llama_factory
# 从torch官网匹配cuda 12.8版本
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple# 验证安装
llamafactory-cli version

修改环境变量

方便每次登录终端自动切换到对应的Python环境

cat >> ~/.bash_profile << EOF
conda activate llama_factory
EOF
source ~/.bash_profile

安装deepspeed

单卡16G显存微调7B模型有点吃紧,可以使用deepspeed解决

# 查询cuda版本
conda search cuda -c nvidia
# 安装与nvidia-smi对应的版本(安装cuda是安装deepspeed的前置条件)
conda install cuda=12.8.1 -c nvidia
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed==0.16.7

Alpaca数据集准备

  • alpaca示例格式
[{"instruction": "Invent a new word by combining two existing words.","input": "","output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."},{"instruction": "Transform the following sentence using a synonym: The car sped quickly.","input": "","output": "The car accelerated rapidly."}
  • 修改LLaMA-Factory/data/dataset_info.json,增加如下信息
"cmic_financial_apaca": {"file_name": "/data/llm/dataset/cmic_financial_apaca.json","columns": {"prompt": "instruction","query": "input","response": "output","system": "system"}
}

lora配置文件准备

  • 备份原始文件
cd LLaMA-Factory/examples/train_lora
# 从原始文件复制一份新的出来
cp llama3_lora_sft.yaml ds_qwen7b_lora_sft.yaml
vi examples/train_lora/ds_qwen7b_lora_sft.yaml
  • 修改ds_qwen7b_lora_sft.yaml,主要修改如下字段
    • model_name_or_path
    • dataset
    • template
    • cutoff_len
    • max_samples
    • output_dir
  • 需要关注以下参数
    • model_name_or_path: 模型路径
    • dataset: 数据集名称,对应上面声明的cmic_financial_apaca
    • template: 模版
    • cutoff_len: 控制输入序列的最大长度
    • output_dir: 微调后权重保存路径
    • gradient_accumulation_steps: 梯度累积的步数,GPU资源不足时需要减少该值
    • num_train_epochs: 训练的轮数
  • ds_qwen7b_lora_sft.yaml完整内容如下
### model
model_name_or_path: /data/llm/models/DeepSeek-R1-Distill-Qwen-7B
trust_remote_code: true### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all### dataset
dataset: cmic_financial_apaca
template: deepseek3
cutoff_len: 4096
max_samples: 4019
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4### output
output_dir: /data/llm/models/sft/DeepSeek-R1-Distill-Qwen-7B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null### eval
#eval_dataset: alpaca_en_demo
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

设置GPU个数

多卡并行时,设置GPU个数,修改: LLaMA-Factory/examples/accelerate/fsdp_config.yaml

num_processes: 8  # the number of GPUs in all nodes

启动微调

conda activate llama_factory# 后台运行
nohup llamafactory-cli train /data/ProgramFiles/LLaMA-Factory/examples/train_lora/ds_qwen7b_lora_sft.yaml > nohup.log 2>&1 &# 查看日志
tail -fn200 nohup.log

查看微调时GPU情况

运行命令: watch -n 0.5 nvidia-smi

Every 0.5s: nvidia-smi                                                                                                                                             localhost.localdomain: Mon Apr 28 15:33:10 2025Mon Apr 28 15:33:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version:      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   39C    P8             19W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   40C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   37C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   37C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   36C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   40C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   42C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   41C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

启动webui服务

# 关闭防火墙,方便访问web服务端口
sudo systemctl stop firewalldGRADIO_SHARE=1
nohup llamafactory-cli webui > webui.log 2>&1 &
tail -fn200 webui.log

超参配置1
超参配置2

对话

llamafactory-cli chat /data/ProgramFiles/LLaMA-Factory/examples/inference/ds_qwen7b_lora_sft.yaml

损失函数曲线

损失函数曲线

参考资料

  • 开源模型应用落地-DeepSeek-R1-Distill-Qwen-7B-LoRA微调-LLaMA-Factory-单机单卡-V100(一)
  • 保姆级零基础微调大模型(LLaMa-Factory,多卡版)
  • LLaMA-Factory:手把手教你从零微调大模型!
  • 大语言模型训练“参数”到底改怎么调???

相关文章:

  • NOIP1999提高组.拦截导弹
  • PPL困惑度的计算
  • 【分享】KK/BD/XL等六大不限速下载
  • 图灵爬虫练习平台第七题千山鸟飞绝js逆向
  • 计算机网络笔记(十七)——3.4扩展的以太网
  • 【论文阅读】FreePCA
  • YOLO使用CableInspect-AD数据集实现输电线路缺陷检测
  • ArrayList和LinkedList区别
  • cilium路由模式和aws-eni模式下的IPAM
  • Dify MCP实战 - 邮件发送
  • Cron 表达式
  • AWS IoT Core与MSK跨账号集成:突破边界的IoT数据处理方案
  • HarmonyOS NEXT 免费无广告看电影app:从想法到实现的经验总结
  • 【Python 列表(List)】
  • 前台--Android开发
  • p2p虚拟服务器
  • 佰力博科技与您探讨薄膜极化的类型、机制与应用领域
  • Spring 框架实战:如何实现高效的依赖注入,优化项目结构?
  • 使用Python和TensorFlow实现图像分类的人工智能应用
  • (x ^ 2 + 2y − 1) ^ 3 − x ^ 2 * y ^ 3 = 1
  • 《尤物公园》连演8场:观众上台,每一场演出都独一无二
  • 看展览|2025影像上海艺博会:市场与当代媒介中的摄影
  • 经济日报整版聚焦“妈妈岗”:就业路越走越宽,有温度重实效
  • 奥园集团将召开债券持有人会议,拟调整“H20奥园2”本息兑付方案
  • 如此城市|上海老邬:《爱情神话》就是我生活的一部分
  • 常州市委原常委、组织部部长陈翔调任江苏省民宗委副主任