进阶岛 - InternVL 多模态模型部署微调实践
一、写在前面(什么是InternVL)
InternVL 是一种用于多模态任务的深度学习模型,旨在处理和理解多种类型的数据输入,如图像和文本。它结合了视觉和语言模型,能够执行复杂的跨模态任务,比如图文匹配、图像描述生成等。通过整合视觉特征和语言信息,InternVL 可以在多模态领域取得更好的表现
二、InternVL 模型总览

 对于InternVL这个模型来说,它vision模块就是一个微调过的ViT,llm模块是一个InternLM的模型。对于视觉模块来说,它的特殊之处在Dynamic High Resolution。
三、Dynamic High Resolution
动态高分辨率,为了让ViT模型能够尽可能获取到更细节的图像信息,提高视觉特征的表达能力。对于输入的图片,首先resize成448的倍数,然后按照预定义的尺寸比例从图片上crop对应的区域。细节如图所示。
 
Pixel Shuffle
Pixel Shuffle在超分任务中是一个常见的操作,PyTorch中有官方实现,即nn.PixelShuffle(upscale_factor) 该类的作用就是将一个tensor中的元素值进行重排列,假设tensor维度为[B, C, H, W], PixelShuffle操作不仅可以改变tensor的通道数,也会改变特征图的大小。
四、InternVL 部署微调实践
我们选定的任务是让InternVL-2B生成文生图提示词,这个任务需要VLM对图片有格式化的描述并输出。
 让我们来一起完成一个用VLM模型进行冷笑话生成,让你的模型说出很逗的冷笑话吧。在这里,我们微调InterenVL使用xtuner。部署InternVL使用lmdeploy。
4.1 准备InternVL模型
我们使用InternVL2-2B模型。该模型已在share文件夹下挂载好,现在让我们把移动出来。
cd /root
mkdir -p model
cp 模型
cp -r /root/share/new_models/OpenGVLab/InternVL2-2B /root/model/
4.2 准备环境
这里我们来手动配置下xtuner。
- 配置虚拟环境
conda create --name xtuner python=3.10 -y
# 激活虚拟环境(注意:后续的所有操作都需要在这个虚拟环境中进行)
conda activate xtuner
# 安装一些必要的库
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# 安装其他依赖
apt install libaio-dev
pip install transformers==4.39.3
pip install streamlit==1.36.0
- 安装xtuner
# 创建一个目录,用来存放源代码
mkdir -p /root/InternLM/code
cd /root/InternLM/code
git clone -b v0.1.23  https://github.com/InternLM/XTuner
进入XTuner目录
cd /root/InternLM/code/XTuner
pip install -e '.[deepspeed]'
- 安装LMDeploy
pip install lmdeploy==0.5.3
- 安装验证
xtuner version
xtuner help
确认一下你的版本号和我们一致哦~
4.3 准备微调数据集
我们这里使用huggingface上的zhongshsh/CLoT-Oogiri-GO据集,特别鸣谢~。
@misc{zhong2023clot,
  title={Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation},
  author={Zhong, Shanshan and Huang, Zhongzhan and Gao, Shanghua and Wen, Weushao and Lin, Liang and Zitnik, Marinka and Zhou, Pan},
  journal={arXiv preprint arXiv:2312.02439},
  year={2023}
}

 数据集我们从官网下载下来并进行去重,只保留中文数据等操作。并制作成XTuner需要的形式。并已在share里,我们一起从share里挪出数据集。
## 首先让我们安装一下需要的包
pip install datasets matplotlib Pillow timm
## 让我们把数据集挪出来
cp -r /root/share/new_models/datasets/CLoT_cn_2000 /root/InternLM/datasets/
让我们打开数据集的一张图看看,我们选择jsonl里的第一条数据对应的图片。首先我们先把这张图片挪动到InternLM文件夹下面。
 cp InternLM/datasets/CLoT_cn_2000/ex_images/007aPnLRgy1hb39z0im50j30ci0el0wm.jpg InternLM/
 哈哈,是两只猫在掐架。那我给到的冷笑话回复是什么呢?
 
4.4 InternVL 推理部署攻略
我们用LMDeploy来推理这张图片~看看它能不能成功解释出梗图呢?
使用pipeline进行推理
之后我们使用lmdeploy自带的pipeline工具进行开箱即用的推理流程,首先我们新建一个文件。
touch /root/InternLM/code/test_lmdeploy.py
cd /root/InternLM/code/
然后把以下代码拷贝进test_lmdeploy.py中。
from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('/root/model/InternVL2-2B')
image = load_image('/root/InternLM/007aPnLRgy1hb39z0im50j30ci0el0wm.jpg')
response = pipe(('请你根据这张图片,讲一个脑洞大开的梗', image))
print(response.text)
运行执行推理结果。
 python3 test_lmdeploy.py
推理后
[WARNING] gemm_config.in is not found; using default GEMM algo
这张图片中的猫咪看起来非常可爱和调皮,它穿着一件“衬衫”,像是在模仿人类的装扮。这个场景让人联想到一些“衬衫猫咪”的梗,即猫咪穿着人类的服装,表现出一种反差萌。
这种“衬衫猫咪”梗源自于网络上的一种幽默表达方式,用来形容那些看起来非常可爱,但实际上却有些“不正经”的行为。这种“不正经”的表现往往让人忍俊不禁,因为猫咪的行为往往显得很无辜和无辜,就像穿着一件不合身的衣服一样。
具体来说,这张图片中的猫咪穿着一件“衬衫”,看起来像是在模仿人类的装扮,尤其是衬衫领口的设计和衣领的样式。这种装扮不仅显得它像人类一样,而且也突显了它的可爱和无辜,就像一个“天真”的小孩子一样。
这种“衬衫猫咪”的梗在网络上非常流行,经常被用来形容那些行为可爱但有些“不正经”的猫咪。无论是作为搞笑素材,还是作为猫咪装扮的参考,这种“衬衫猫咪”都深受大家的喜爱。
推理后我们发现直接使用2b模型不能很好的讲出梗,现在我们要对这个2b模型进行微调。
4.5 InternVL 微调攻略
准备数据集
数据集格式为:
# 为了高效训练,请确保数据格式为:
{
    "id": "000000033471",
    "image": ["coco/train2017/000000033471.jpg"], # 如果是纯文本,则该字段为 None 或者不存在
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat are the colors of the bus in the image?"
      },
      {
        "from": "gpt",
        "value": "The bus in the image is white and red."
      }
    ]
  }
这里我们也为大家准备好了可以直接进行微调的数据集。数据集就是咱们刚才复制进InternLM/datasets的数据。
(一)配置微调参数
让我们一起修改XTuner下 InternVL的config,文件在: /root/InternLM/code/XTuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_qlora_finetune.py
 首先我们先对微调config进行介绍:
- setting里是定义模型基本参数的
#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
# 模型地址
path = '/root/model/InternVL2-2B'
# Data
# 数据地址
data_root = '/root/data/'
# data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
data_path = '/root/data/screenshot_od/layout_ocr_multi.json'
image_folder = data_root + 'screenshot_od/images'
prompt_template = PROMPT_TEMPLATE.internlm2_chat
# 模型最大输出长度
max_length = 8192
# Scheduler & Optimizer
#每张卡上的batch size大小
batch_size = 8  # per_device
# 梯度累积大小
accumulative_counts = 2
# dataloader数量
dataloader_num_workers = 4
# epoch大小
max_epochs = 1
# 优化器类型
optim_type = AdamW
# official 1024 -> 4e-5
lr = 1e-6
betas = (0.9, 0.999)
weight_decay = 0.05
max_norm = 1  # grad clip
warmup_ratio = 0.03
# Save
save_steps = 1000
save_total_limit = 1  # Maximum checkpoints to keep (-1 means unlimited)
- 模型,tokenizer数据等定义
#######################################################################
#            PART 2  Model & Tokenizer & Image Processor              #
#######################################################################
model = dict(
    type=InternVL_V1_5,
    model_path=path,
    freeze_llm=True,
    freeze_visual_encoder=True,
    quantization_llm=True,  # or False
    quantization_vit=False,  # or True and uncomment visual_encoder_lora
    # comment the following lines if you don't want to use Lora in llm
    llm_lora=dict(
        type=LoraConfig,
        r=128,
        lora_alpha=256,
        lora_dropout=0.05,
        target_modules=None,
        task_type='CAUSAL_LM'),
    # uncomment the following lines if you don't want to use Lora in visual encoder # noqa
    # visual_encoder_lora=dict(
    #     type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.05,
    #     target_modules=['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2'])
)
#######################################################################
#                      PART 3  Dataset & Dataloader                   #
#######################################################################
llava_dataset = dict(
    type=InternVL_V1_5_Dataset,
    model_path=path,
    data_paths=data_path,
    image_folders=image_folder,
    template=prompt_template,
    max_length=max_length)
train_dataloader = dict(
    batch_size=batch_size,
    num_workers=dataloader_num_workers,
    dataset=llava_dataset,
    sampler=dict(
        type=LengthGroupedSampler,
        length_property='modality_length',
        per_device_batch_size=batch_size * accumulative_counts),
    collate_fn=dict(type=default_collate_fn))
- 调度,优化器等定义
#######################################################################
#                    PART 4  Scheduler & Optimizer                    #
#######################################################################
# optimizer
optim_wrapper = dict(
    type=AmpOptimWrapper,
    optimizer=dict(
        type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
    clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
    accumulative_counts=accumulative_counts,
    loss_scale='dynamic',
    dtype='float16')
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]
# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
#######################################################################
#                           PART 5  Runtime                           #
#######################################################################
# Log the dialogue periodically during the training process, optional
tokenizer = dict(
    type=AutoTokenizer.from_pretrained,
    pretrained_model_name_or_path=path,
    trust_remote_code=True)
custom_hooks = [
    dict(type=DatasetInfoHook, tokenizer=tokenizer),
]
# configure default hooks
default_hooks = dict(
    # record the time of every iteration.
    timer=dict(type=IterTimerHook),
    # print log every 10 iterations.
    logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
    # enable the parameter scheduler.
    param_scheduler=dict(type=ParamSchedulerHook),
    # save checkpoint per `save_steps`.
    checkpoint=dict(
        type=CheckpointHook,
        save_optimizer=False,
        by_epoch=False,
        interval=save_steps,
        max_keep_ckpts=save_total_limit),
    # set sampler seed in distributed evrionment.
    sampler_seed=dict(type=DistSamplerSeedHook),
)
# configure environment
env_cfg = dict(
    # whether to enable cudnn benchmark
    cudnn_benchmark=False,
    # set multi process parameters
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    # set distributed parameters
    dist_cfg=dict(backend='nccl'),
)
- 需要修改的部分
最基础修改一下模型地址和数据地址即可。
#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
path = '/root/model/InternVL2-2B'
# Data
#data_root = './data/llava_data/'
#data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
#image_folder = data_root + 'llava_images'
#prompt_template = PROMPT_TEMPLATE.internlm2_chat
#max_length = 8192
data_root = '/root/InternLM/datasets/CLoT_cn_2000/'
data_path = data_root + 'ex_cn.json'
image_folder = data_root
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 6656
# Scheduler & Optimizer
batch_size = 2  # per_device
- 总体config文件(复制即可)
# Copyright (c) OpenMMLab. All rights reserved.
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
                            LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import AutoTokenizer
from xtuner.dataset import InternVL_V1_5_Dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.samplers import LengthGroupedSampler
from xtuner.engine.hooks import DatasetInfoHook
from xtuner.engine.runner import TrainLoop
from xtuner.model import InternVL_V1_5
from xtuner.utils import PROMPT_TEMPLATE
#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
path = '/root/model/InternVL2-2B'
# Data
data_root = '/root/InternLM/datasets/CLoT_cn_2000/'
data_path = data_root + 'ex_cn.json'
image_folder = data_root
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 6656
# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 4
dataloader_num_workers = 4
max_epochs = 1
optim_type = AdamW
# official 1024 -> 4e-5
lr = 2e-5
betas = (0.9, 0.999)
weight_decay = 0.05
max_norm = 1  # grad clip
warmup_ratio = 0.03
# Save
save_steps = 1000
save_total_limit = 1  # Maximum checkpoints to keep (-1 means unlimited)
#######################################################################
#            PART 2  Model & Tokenizer & Image Processor              #
#######################################################################
model = dict(
    type=InternVL_V1_5,
    model_path=path,
    freeze_llm=True,
    freeze_visual_encoder=True,
    quantization_llm=True,  # or False
    quantization_vit=False,  # or True and uncomment visual_encoder_lora
    # comment the following lines if you don't want to use Lora in llm
    llm_lora=dict(
        type=LoraConfig,
        r=128,
        lora_alpha=256,
        lora_dropout=0.05,
        target_modules=None,
        task_type='CAUSAL_LM'),
    # uncomment the following lines if you don't want to use Lora in visual encoder # noqa
    # visual_encoder_lora=dict(
    #     type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.05,
    #     target_modules=['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2'])
)
#######################################################################
#                      PART 3  Dataset & Dataloader                   #
#######################################################################
llava_dataset = dict(
    type=InternVL_V1_5_Dataset,
    model_path=path,
    data_paths=data_path,
    image_folders=image_folder,
    template=prompt_template,
    max_length=max_length)
train_dataloader = dict(
    batch_size=batch_size,
    num_workers=dataloader_num_workers,
    dataset=llava_dataset,
    sampler=dict(
        type=LengthGroupedSampler,
        length_property='modality_length',
        per_device_batch_size=batch_size * accumulative_counts),
    collate_fn=dict(type=default_collate_fn))
#######################################################################
#                    PART 4  Scheduler & Optimizer                    #
#######################################################################
# optimizer
optim_wrapper = dict(
    type=AmpOptimWrapper,
    optimizer=dict(
        type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
    clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
    accumulative_counts=accumulative_counts,
    loss_scale='dynamic',
    dtype='float16')
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]
# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
#######################################################################
#                           PART 5  Runtime                           #
#######################################################################
# Log the dialogue periodically during the training process, optional
tokenizer = dict(
    type=AutoTokenizer.from_pretrained,
    pretrained_model_name_or_path=path,
    trust_remote_code=True)
custom_hooks = [
    dict(type=DatasetInfoHook, tokenizer=tokenizer),
]
# configure default hooks
default_hooks = dict(
    # record the time of every iteration.
    timer=dict(type=IterTimerHook),
    # print log every 10 iterations.
    logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
    # enable the parameter scheduler.
    param_scheduler=dict(type=ParamSchedulerHook),
    # save checkpoint per `save_steps`.
    checkpoint=dict(
        type=CheckpointHook,
        save_optimizer=False,
        by_epoch=False,
        interval=save_steps,
        max_keep_ckpts=save_total_limit),
    # set sampler seed in distributed evrionment.
    sampler_seed=dict(type=DistSamplerSeedHook),
)
# configure environment
env_cfg = dict(
    # whether to enable cudnn benchmark
    cudnn_benchmark=False,
    # set multi process parameters
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    # set distributed parameters
    dist_cfg=dict(backend='nccl'),
)
# set visualizer
visualizer = None
# set log level
log_level = 'INFO'
# load from which checkpoint
load_from = None
# whether to resume training from the loaded checkpoint
resume = False
# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)
# set log processor
log_processor = dict(by_epoch=False)
(二)开始训练
这里使用之前搞好的configs进行训练。咱们要调整一下batch size,并且使用qlora。要不半卡不够用的 QAQ。
NPROC_PER_NODE=1 xtuner train /root/InternLM/code/XTuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_qlora_finetune.py  --work-dir /root/InternLM/work_dir/internvl_ft_run_8_filter  --deepspeed deepspeed_zero1
用默认的batch_size=4,还是OutOfMemoryError了,改成2后正常运行。
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.71 GiB. GPU
(xtuner0121) root@intern-studio-50211982:~/InternLM/code/XTuner#
batch比较小,在A100*30%的机器上,总共花费了11个小时才完成微调。
...
请你根据这张图片,讲一个脑洞大开的梗<|im_end|><|im_start|> assistant
果然!大家都会把鼻屎抹在课桌下面<|im_end|>
08/21 17:44:42 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
08/21 17:44:42 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
08/21 17:44:42 - mmengine - INFO - Checkpoints will be saved to /root/InternLM/work_dir/internvl_ft_run_8_filter.
/root/.conda/envs/xtuner0121/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/.conda/envs/xtuner0121/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3417
08/21 17:45:09 - mmengine - INFO - Iter(train) [  10/6000]  lr: 1.0058e-06  eta: 4:31:27  time: 2.7191  data_time: 0.0114  memory: 14652  loss: 5.1294
08/21 17:45:33 - mmengine - INFO - Iter(train) [  20/6000]  lr: 2.1231e-06  eta: 4:12:06  time: 2.3398  data_time: 0.0235  memory: 14651  loss: 5.7605
08/21 17:46:02 - mmengine - INFO - Iter(train) [  30/6000]  lr: 3.2404e-06  eta: 4:24:06  time: 2.9042  data_time: 0.0163  memory: 14646  loss: 5.9527
08/21 17:46:32 - mmengine - INFO - Iter(train) [  40/6000]  lr: 4.3577e-06  eta: 4:31:54  time: 2.9860  data_time: 0.0172  memory: 14632  loss: 6.1117
08/21 17:47:01 - mmengine - INFO - Iter(train) [  50/6000]  lr: 5.4750e-06  eta: 4:34:24  time: 2.8867  data_time: 0.0166  memory: 14593  loss: 5.4724
08/21 17:47:25 - mmengine - INFO - Iter(train) [  60/6000]  lr: 6.5923e-06  eta: 4:28:36  time: 2.4438  data_time: 0.0131  memory: 14632  loss: 5.4773
08/21 17:48:03 - mmengine - INFO - Iter(train) [  70/6000]  lr: 7.7096e-06  eta: 4:43:12  time: 3.7792  data_time: 0.0168  memory: 14630  loss: 5.7110
08/21 17:48:36 - mmengine - INFO - Iter(train) [  80/6000]  lr: 8.8269e-06  eta: 4:48:23  time: 3.3245  data_time: 0.0153  memory: 14637  loss: 5.7963
08/21 17:49:27 - mmengine - INFO - Iter(train) [  90/6000]  lr: 9.9442e-06  eta: 5:11:12  time: 5.0525  data_time: 0.0162  memory: 14625  loss: 5.3584
08/21 17:50:24 - mmengine - INFO - Iter(train) [ 100/6000]  lr: 1.1062e-05  eta: 5:36:22  time: 5.7718  data_time: 0.0193  memory: 14622  loss: 5.1775
...
dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3406
08/21 23:49:43 - mmengine - INFO - Iter(train) [3410/6000]  lr: 8.2866e-06  eta: 4:37:14  time: 6.1520  data_time: 0.0189  memory: 14672  loss: 2.7470
08/21 23:50:49 - mmengine - INFO - Iter(train) [3420/6000]  lr: 8.2334e-06  eta: 4:36:11  time: 6.6438  data_time: 0.0154  memory: 14648  loss: 2.4156
08/21 23:52:01 - mmengine - INFO - Iter(train) [3430/6000]  lr: 8.1803e-06  eta: 4:35:12  time: 7.1605  data_time: 0.0180  memory: 14577  loss: 2.0693
08/21 23:53:12 - mmengine - INFO - Iter(train) [3440/6000]  lr: 8.1272e-06  eta: 4:34:13  time: 7.0844  data_time: 0.0171  memory: 14624  loss: 1.7735
08/21 23:54:29 - mmengine - INFO - Iter(train) [3450/6000]  lr: 8.0743e-06  eta: 4:33:18  time: 7.7216  data_time: 0.0191  memory: 14630  loss: 1.8995
08/21 23:55:41 - mmengine - INFO - Iter(train) [3460/6000]  lr: 8.0213e-06  eta: 4:32:20  time: 7.2184  data_time: 0.0177  memory: 14631  loss: 1.7147
08/21 23:56:46 - mmengine - INFO - Iter(train) [3470/6000]  lr: 7.9684e-06  eta: 4:31:16  time: 6.4690  data_time: 0.0154  memory: 14636  loss: 1.8168
08/21 23:57:51 - mmengine - INFO - Iter(train) [3480/6000]  lr: 7.9156e-06  eta: 4:30:12  time: 6.5319  data_time: 0.0155  memory: 14644  loss: 1.8507
08/21 23:59:01 - mmengine - INFO - Iter(train) [3490/6000]  lr: 7.8628e-06  eta: 4:29:12  time: 7.0294  data_time: 0.0159  memory: 14624  loss: 1.5084
08/22 00:00:07 - mmengine - INFO - Iter(train) [3500/6000]  lr: 7.8101e-06  eta: 4:28:08  time: 6.5171  data_time: 0.0157  memory: 14620  loss: 1.7641
...
2024/08/22 04:23:17 - mmengine - INFO - Iter(train) [5890/6000]  lr: 1.7945e-08  eta: 0:11:55  time: 6.6322  data_time: 0.0162  memory: 14624  loss: 0.4356
2024/08/22 04:24:23 - mmengine - INFO - Iter(train) [5900/6000]  lr: 1.4858e-08  eta: 0:10:50  time: 6.5471  data_time: 0.0143  memory: 14627  loss: 0.4764
2024/08/22 04:25:28 - mmengine - INFO - Iter(train) [5910/6000]  lr: 1.2062e-08  eta: 0:09:45  time: 6.5247  data_time: 0.0167  memory: 14621  loss: 0.5624
2024/08/22 04:26:22 - mmengine - INFO - Iter(train) [5920/6000]  lr: 9.5571e-09  eta: 0:08:40  time: 5.4022  data_time: 0.0142  memory: 14620  loss: 0.6877
2024/08/22 04:27:22 - mmengine - INFO - Iter(train) [5930/6000]  lr: 7.3432e-09  eta: 0:07:35  time: 5.9815  data_time: 0.0137  memory: 14614  loss: 0.4565
2024/08/22 04:28:29 - mmengine - INFO - Iter(train) [5940/6000]  lr: 5.4206e-09  eta: 0:06:30  time: 6.7140  data_time: 0.0140  memory: 14618  loss: 0.5041
2024/08/22 04:29:32 - mmengine - INFO - Iter(train) [5950/6000]  lr: 3.7891e-09  eta: 0:05:25  time: 6.2728  data_time: 0.0144  memory: 14607  loss: 0.4104
2024/08/22 04:30:32 - mmengine - INFO - Iter(train) [5960/6000]  lr: 2.4489e-09  eta: 0:04:20  time: 6.0393  data_time: 0.0131  memory: 14615  loss: 0.3896
2024/08/22 04:31:35 - mmengine - INFO - Iter(train) [5970/6000]  lr: 1.4000e-09  eta: 0:03:15  time: 6.3093  data_time: 0.0140  memory: 14615  loss: 0.2181
2024/08/22 04:32:46 - mmengine - INFO - Iter(train) [5980/6000]  lr: 6.4248e-10  eta: 0:02:10  time: 7.0711  data_time: 0.0175  memory: 14614  loss: 0.2556
2024/08/22 04:33:50 - mmengine - INFO - Iter(train) [5990/6000]  lr: 1.7628e-10  eta: 0:01:05  time: 6.4128  data_time: 0.0141  memory: 14617  loss: 0.3989
2024/08/22 04:34:57 - mmengine - INFO - Exp name: internvl_v2_internlm2_2b_qlora_finetune_20240821_174406
2024/08/22 04:34:57 - mmengine - INFO - Iter(train) [6000/6000]  lr: 1.4569e-12  eta: 0:00:00  time: 6.6742  data_time: 0.0152  memory: 14608  loss: 0.1039
(三)合并权重&&模型转换
用官方脚本进行权重合并
python3 xtuner/configs/internvl/v1_5/convert_to_official.py xtuner/configs/internvl/v2/internvl_v2_internlm2_5_8b_qlora_finetune.py /root/InternLM/work_dir/internvl_ft_run_8_filter/iter_6000.pth /root/InternLM/InternVL2-2B/
需要安装flash_attn,如果出现卡住的情况,需要先安装ninja
pip install ninja
pip install flash_attn
最后我们的模型合并后,在:/root/InternLM/convert_model/,文件格式:
tree /root/InternLM/InternVL2-2B
/root/InternLM/InternVL2-2B
├── added_tokens.json
├── config.json
├── configuration_intern_vit.py
├── configuration_internlm2.py
├── configuration_internvl_chat.py
├── conversation.py
├── generation_config.json
├── model.safetensors
├── modeling_intern_vit.py
├── modeling_internlm2.py
├── modeling_internvl_chat.py
├── special_tokens_map.json
├── tokenization_internlm2.py
├── tokenizer.model
└── tokenizer_config.json
比较一下大小,只不过增加了59K。
du /root/InternLM/InternVL2-2B /root/model/InternVL2-2B -h -k
4309789 /root/InternLM/InternVL2-2B
2029    /root/model/InternVL2-2B/examples
4311877 /root/model/InternVL2-2B
4.6 微调后效果对比
现在我们微调好啦,让我们重新再来试试这张小猫图片吧!
 
 我们模型替换一下,然后跑一下效果。
from lmdeploy import pipeline
from lmdeploy.vl import load_image
#pipe = pipeline('/root/model/InternVL2-2B')
pipe = pipeline('/root/InternLM/InternVL2-2B')
image = load_image('/root/InternLM/256321723775630_.pic.jpg')
for i in range(20):
    response = pipe(('请你根据这张图片,讲一个脑洞大开的梗', image))
    print(response.text)
微调后的模型结果果然有趣多了
(xtuner0121) root@intern-studio-50211982:~/InternLM/code# python ./test_lmdeploy.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo
被穿了外套
被粘住了无法逃脱的猫猫
被子和被子同时醒来
“哥哥,你别闹了!我刚刚只是迷路而已!”
被强行拉去体验打蚊子大赛
被粘住了,别过来!
被猫爪抓伤了
已经不是第一次了,果然还是不行啊!
说拜拜,稍等,我马上就到!
被强行穿上衣服后不肯脱下来的猫
被粘住了,爬不下来了
我因为上半身是虎的,所以就当成了虎哥
被猫欺负的兔子回家后吓唬父母
刚才不是已经解了印子月的印吗?
我因为指出打工好累所以叫你别闹了,没想到你真的以为我在闹?
对,非常好吃,真的非常好吃,我感动得热泪盈眶。
被穿了裙子的猫
我告诉你,别试图去拉扯我的头,会有点可怕的故事发生。
说要拜师学艺被当成愣头青教手艺人一直在教主做俯卧撑
刚才不是说了我不是阿婆吗!
(xtuner0121) root@intern-studio-50211982:~/InternLM/code# python ./test_lmdeploy.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo
被男朋友脱下了衣服
被逼着去揭发出轨的哥哥
被猫爪抓伤了,好可怕啊~
被强行穿上衣服后不肯脱的猫
被猫爪抓伤后,拼命擦去血渍说:“我一点也没擦伤,你给我闭嘴!”
被猫爪抓伤的肿了的位置
被子和被子混在一起打滚
我因为胸口好痛所以叫了!
被粘住无法逃脱的倒霉猫
被猫爪抓伤的肿起部分
被粘住了无法逃脱的猫
刚才不是说了我是第一名吗!
说:“我养的猫这么可爱,你怎么还不要呢?”
被猫爪抓伤的胳膊怎么还不上火
被强行换上新手服后猛地意识到是新手的猫
被子和被子一起被尿了
被猫爪抓伤了
被猫爪抓伤的腋下
被猫爪抓伤了
已经不是第一次了,果然还是没忍住~
