当前位置：首页 > news >正文

使用unsloth模型微调过程

news 2025/11/6 19:18:47

1、写在前面

由于微调程序在运行的时候要不断地从头开始执行，而模型的训练过程耗时比较多，因此笔者建议使用jupyter notebook进行训练。能方便我们看到每一次执行的结果，也便于我们及时发现异常，检查代码执行情况。

2、检查本地环境

我们使用以下代码，检查目前cuda环境状态

import torch# 检查 PyTorch 是否安装成功
print("PyTorch 版本:", torch.__version__)# 检查 CUDA 是否可用
print("CUDA 是否可用:", torch.cuda.is_available())# 如果 CUDA 可用，获取 GPU 设备信息
if torch.cuda.is_available():print("GPU 数量:", torch.cuda.device_count())print("当前 GPU 设备:", torch.cuda.current_device())print("GPU 设备名称:", torch.cuda.get_device_name(torch.cuda.current_device()))

如果正常的话，你可以看到类似以下的结果，这就证明你的电脑运行环境具备显卡资源，可以做下一步的微调工作。
在这里插入图片描述

2、引入预训练模型

我们可以参照以下官方文档。

https://docs.unsloth.ai/get-started/fine-tuning-guide
https://docs.unsloth.ai/get-started/unsloth-notebooks
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_VL_(7B)-Vision.ipynb
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(14B)-Reasoning-Conversational.ipynb#scrollTo=QmUBVEnvCDJv

根据这个文档，我们的具体代码如下：

from unsloth import FastVisionModel  # FastLanguageModel for LLMs
import torch# 4bit pre-quantized models we support for faster downloading + no OOMs.
fourbit_models = ["model_download/Qwen2.5-VL-7B-Instruct"  # 本地模型路径
]  # 更多模型信息见 https://huggingface.co/unslothmodel, tokenizer = FastVisionModel.from_pretrained("model_download/Qwen2.5-VL-7B-Instruct",load_in_4bit=True,  # 使用4bit来减少内存占用use_gradient_checkpointing="unsloth",  # 用于长上下文
)

首先，官网强调了，建议使用尾缀为instruct的模型文件，因为它们支持使用对话模板（如 ChatML、ShareGPT）进行直接微调，并且相比基础模型（base model，例如 Alpaca、Vicuna 等）来说，所需数据更少。
这里要解释的是，首先他定义了一个名为fourbit_models的列表。这个列表的作用是罗列出目前unsloth所支持的预训练模型都有哪些。
在官网上，他是这样写的

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = ["unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit", # Llama 3.2 vision support"unsloth/Llama-3.2-11B-Vision-bnb-4bit","unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit", # Can fit in a 80GB card!"unsloth/Llama-3.2-90B-Vision-bnb-4bit","unsloth/Pixtral-12B-2409-bnb-4bit",              # Pixtral fits in 16GB!"unsloth/Pixtral-12B-Base-2409-bnb-4bit",         # Pixtral base model"unsloth/Qwen2-VL-2B-Instruct-bnb-4bit",          # Qwen2 VL support"unsloth/Qwen2-VL-7B-Instruct-bnb-4bit","unsloth/Qwen2-VL-72B-Instruct-bnb-4bit","unsloth/llava-v1.6-mistral-7b-hf-bnb-4bit",      # Any Llava variant works!"unsloth/llava-1.5-7b-hf-bnb-4bit",
] # More models at https://huggingface.co/unsloth

他是按照如下结果来写的。也就是每一个模型的名称都可以在hugging face上下载下来。

fourbit_models = ["unsloth/...-bnb-4bit",  # 每个模型名都是 Hugging Face 上的路径
]

当然，我们可以将模型下载到本地，在本地调用。这个列表的作用就像是一个菜单，你可以从中挑选适合自己任务的模型。

在列出的所有可用的fourbit模型列表后，我们通过以下代码加载具体要进行微调的预训练模型：

model, tokenizer = FastVisionModel.from_pretrained("model_download/Qwen2.5-VL-7B-Instruct",load_in_4bit=True,  # 使用4bit来减少内存占用use_gradient_checkpointing="unsloth",  # 用于长上下文
)