【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B
【LLaMA-Factory】使用LoRa微调训练DeepSeek-R1-Distill-Qwen-7B
- 本地环境说明
- 禁用开源驱动nouveau
- 安装nvidia-smi
- 安装Git环境
- 安装Anaconda(conda)环境
- 下载`DeepSeek-R1-Distill-Qwen-7B`模型
- 安装LLaMA-Factory
- 下载LLaMA-Factory
- 安装LLaMA-Factory依赖
- 修改环境变量
- 安装deepspeed
- Alpaca数据集准备
- lora配置文件准备
- 设置GPU个数
- 启动微调
- 查看微调时GPU情况
- 启动webui服务
- 对话
- 损失函数曲线
- 参考资料
本地环境说明
依赖 | 版本 |
---|---|
Linux | BigCloud Enterprise Linux 8.6 |
GPU | NVIDIA Tesla T4 16G * 8 |
禁用开源驱动nouveau
如果不禁用开源驱动,直接安装nvidia-smi
,会安装失败,在日志文件/var/log/nvidia-installer.log
中会出现以下错误信息
ERROR: Unable to load the kernel module 'nvidia.ko'
- 查看
nouveau
是否在运行,先输入指令
lsmod | grep nouveau
如果不出现一下的情况则已经禁用
nouveau 2334720 0
video 57344 1 nouveau
mxm_wmi 16384 1 nouveau
drm_kms_helper 262144 5 drm_vram_helper,ast,nouveau
ttm 114688 3 drm_vram_helper,drm_ttm_helper,nouveau
i2c_algo_bit 16384 3 igb,ast,nouveau
drm 614400 7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,ttm,nouveau
i2c_core 98304 9 drm_kms_helper,i2c_algo_bit,igb,ast,i2c_smbus,i2c_i801,ipmi_ssif,nouveau,drm
wmi 32768 2 mxm_wmi,nouveau
- 禁用
nouveau
# 如果文件不存在,就创建
sudo sh -c 'cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF'# 重新生成 initramfs
sudo dracut --force
# 重启机器
sudo reboot
安装nvidia-smi
- 浏览器访问:
https://www.nvidia.com/en-us/drivers/
需要梯子(科学上网),才能加载
Manual Driver Search
- 填写驱动信息
- 驱动查询结果
- 驱动下载页面
- 下载驱动文件,得到
NVIDIA-Linux-x86_64-570.133.20.run
,上传到Linux
机器上不能复制下载地址,然后在机器上使用
wget
命令直接下载,这样请求会返回403
- 安装驱动
- 必须使用
root
权限安装- -no-x-check #安装驱动时关闭X服务
- -no-nouveau-check #安装驱动时禁用nouveau
- -no-opengl-files #只安装驱动文件,不安装OpenGL文件
- 安装的时候,出现内核模块类型选择,根据提示选择
NVIDIA Proprietary
,使用左右键控制选择,然后回车
Multiple kernel module types are available for this system. Which would you like to use?NVIDIA Proprietary MIT/GPL
- 安装完运行命令确认驱动安装成功
nvidia-smi
显卡信息如下
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:3D:00.0 Off | 0 |
| N/A 51C P0 27W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3E:00.0 Off | 0 |
| N/A 54C P0 28W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla T4 Off | 00000000:40:00.0 Off | 0 |
| N/A 51C P0 27W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla T4 Off | 00000000:41:00.0 Off | 0 |
| N/A 48C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla T4 Off | 00000000:B1:00.0 Off | 0 |
| N/A 47C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Tesla T4 Off | 00000000:B2:00.0 Off | 0 |
| N/A 53C P0 30W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Tesla T4 Off | 00000000:B4:00.0 Off | 0 |
| N/A 56C P0 30W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Tesla T4 Off | 00000000:B5:00.0 Off | 0 |
| N/A 55C P0 30W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
安装Git环境
安装git
和git-lfs
sudo dnf install git git-lfs
安装Anaconda(conda)环境
- 下载页面:
https://www.anaconda.com/download/success
64-Bit (x86) Installer
下载
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
- 安装
sudo sh Anaconda3-2024.10-1-Linux-x86_64.sh
- 会出现很多信息,一路
yes
下去,观看文档用q
跳过
Do you accept the license terms? [yes|no]
>>> yesAnaconda3 will now be installed into this location:
/root/anaconda3- Press ENTER to confirm the location- Press CTRL-C to abort the installation- Or specify a different location below
[/root/anaconda3] >>> /data/ProgramFiles/anaconda3You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
- 设置环境变量
cat >> ~/.bash_profile << EOF
export ANACONDA3_HOME=/data/ProgramFiles/anaconda3
export CONDA_ENVS_PATH=\$ANACONDA3_HOME/envs
export PATH="\$ANACONDA3_HOME/bin:$PATH"
EOF
source ~/.bash_profile# 目录是使用root权限安装的,对目录进行授权
sudo chown -R tkyj.tkyj /data
- 查看
conda
版本以验证是否安装成功
conda -V
- 配置镜像源
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
下载DeepSeek-R1-Distill-Qwen-7B
模型
- 魔搭社区:
https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
mkdir -pv /data/llm/models
cd /data/llm/models# 如果您希望跳过 lfs 大文件下载,可以使用如下命令
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
# 请确保lfs已经被正确安装
cd DeepSeek-R1-Distill-Qwen-7B
git lfs install
# 下载大文件
git lfs pull
安装LLaMA-Factory
- 软件要求
Mandatory | Minimum | Recommend |
---|---|---|
python | 3.9 | 3.10 |
torch | 2.0.0 | 2.6.0 |
transformers | 4.45.0 | 4.50.0 |
datasets | 2.16.0 | 3.2.0 |
accelerate | 0.34.0 | 1.2.1 |
peft | 0.14.0 | 0.15.1 |
trl | 0.8.6 | 0.9.6 |
Optional | Minimum | Recommend |
---|---|---|
CUDA | 11.6 | 12.2 |
deepspeed | 0.10.0 | 0.16.4 |
bitsandbytes | 0.39.0 | 0.43.1 |
vllm | 0.4.3 | 0.8.2 |
flash-attn | 2.5.6 | 2.7.2 |
- 硬件要求
Method | Bits | 7B | 14B | 30B | 70B | x B |
---|---|---|---|---|---|---|
Full (bf16 or fp16 ) | 32 | 120GB | 240GB | 600GB | 1200GB | 18x GB |
Full (pure_bf16 ) | 16 | 60GB | 120GB | 300GB | 600GB | 8x GB |
Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 2x GB |
QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | x GB |
QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | x/2 GB |
QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | x/4 GB |
下载LLaMA-Factory
- 使用
git
克隆项目
cd /data/ProgramFiles
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
安装LLaMA-Factory依赖
cd LLaMA-Factory
conda create --name llama_factory python=3.10
conda activate llama_factory
# 从torch官网匹配cuda 12.8版本
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple# 验证安装
llamafactory-cli version
修改环境变量
方便每次登录终端自动切换到对应的Python
环境
cat >> ~/.bash_profile << EOF
conda activate llama_factory
EOF
source ~/.bash_profile
安装deepspeed
单卡16G
显存微调7B
模型有点吃紧,可以使用deepspeed
解决
# 查询cuda版本
conda search cuda -c nvidia
# 安装与nvidia-smi对应的版本(安装cuda是安装deepspeed的前置条件)
conda install cuda=12.8.1 -c nvidia
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed==0.16.7
Alpaca数据集准备
alpaca
示例格式
[{"instruction": "Invent a new word by combining two existing words.","input": "","output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."},{"instruction": "Transform the following sentence using a synonym: The car sped quickly.","input": "","output": "The car accelerated rapidly."}
- 修改
LLaMA-Factory/data/dataset_info.json
,增加如下信息
"cmic_financial_apaca": {"file_name": "/data/llm/dataset/cmic_financial_apaca.json","columns": {"prompt": "instruction","query": "input","response": "output","system": "system"}
}
lora配置文件准备
- 备份原始文件
cd LLaMA-Factory/examples/train_lora
# 从原始文件复制一份新的出来
cp llama3_lora_sft.yaml ds_qwen7b_lora_sft.yaml
vi examples/train_lora/ds_qwen7b_lora_sft.yaml
- 修改
ds_qwen7b_lora_sft.yaml
,主要修改如下字段model_name_or_path
dataset
template
cutoff_len
max_samples
output_dir
- 需要关注以下参数
model_name_or_path
: 模型路径dataset
: 数据集名称,对应上面声明的cmic_financial_apaca
template
: 模版cutoff_len
: 控制输入序列的最大长度output_dir
: 微调后权重保存路径gradient_accumulation_steps
: 梯度累积的步数,GPU
资源不足时需要减少该值num_train_epochs
: 训练的轮数
ds_qwen7b_lora_sft.yaml
完整内容如下
### model
model_name_or_path: /data/llm/models/DeepSeek-R1-Distill-Qwen-7B
trust_remote_code: true### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all### dataset
dataset: cmic_financial_apaca
template: deepseek3
cutoff_len: 4096
max_samples: 4019
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4### output
output_dir: /data/llm/models/sft/DeepSeek-R1-Distill-Qwen-7B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null### eval
#eval_dataset: alpaca_en_demo
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
设置GPU个数
多卡并行时,设置GPU个数,修改: LLaMA-Factory/examples/accelerate/fsdp_config.yaml
num_processes: 8 # the number of GPUs in all nodes
启动微调
conda activate llama_factory# 后台运行
nohup llamafactory-cli train /data/ProgramFiles/LLaMA-Factory/examples/train_lora/ds_qwen7b_lora_sft.yaml > nohup.log 2>&1 &# 查看日志
tail -fn200 nohup.log
查看微调时GPU情况
运行命令: watch -n 0.5 nvidia-smi
Every 0.5s: nvidia-smi localhost.localdomain: Mon Apr 28 15:33:10 2025Mon Apr 28 15:33:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:3D:00.0 Off | 0 |
| N/A 39C P8 19W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3E:00.0 Off | 0 |
| N/A 40C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla T4 Off | 00000000:40:00.0 Off | 0 |
| N/A 37C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla T4 Off | 00000000:41:00.0 Off | 0 |
| N/A 37C P0 25W / 70W | 0MiB / 15360MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla T4 Off | 00000000:B1:00.0 Off | 0 |
| N/A 36C P8 13W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Tesla T4 Off | 00000000:B2:00.0 Off | 0 |
| N/A 40C P8 15W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Tesla T4 Off | 00000000:B4:00.0 Off | 0 |
| N/A 42C P8 15W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Tesla T4 Off | 00000000:B5:00.0 Off | 0 |
| N/A 41C P8 11W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
启动webui服务
# 关闭防火墙,方便访问web服务端口
sudo systemctl stop firewalldGRADIO_SHARE=1
nohup llamafactory-cli webui > webui.log 2>&1 &
tail -fn200 webui.log
对话
llamafactory-cli chat /data/ProgramFiles/LLaMA-Factory/examples/inference/ds_qwen7b_lora_sft.yaml
损失函数曲线
参考资料
- 开源模型应用落地-DeepSeek-R1-Distill-Qwen-7B-LoRA微调-LLaMA-Factory-单机单卡-V100(一)
- 保姆级零基础微调大模型(LLaMa-Factory,多卡版)
- LLaMA-Factory:手把手教你从零微调大模型!
- 大语言模型训练“参数”到底改怎么调???