进阶岛 - InternVL 多模态模型部署微调实践
一、写在前面(什么是InternVL)
InternVL 是一种用于多模态任务的深度学习模型,旨在处理和理解多种类型的数据输入,如图像和文本。它结合了视觉和语言模型,能够执行复杂的跨模态任务,比如图文匹配、图像描述生成等。通过整合视觉特征和语言信息,InternVL 可以在多模态领域取得更好的表现
二、InternVL 模型总览
对于InternVL这个模型来说,它vision模块就是一个微调过的ViT,llm模块是一个InternLM的模型。对于视觉模块来说,它的特殊之处在Dynamic High Resolution。
三、Dynamic High Resolution
动态高分辨率,为了让ViT模型能够尽可能获取到更细节的图像信息,提高视觉特征的表达能力。对于输入的图片,首先resize成448的倍数,然后按照预定义的尺寸比例从图片上crop对应的区域。细节如图所示。
Pixel Shuffle
Pixel Shuffle在超分任务中是一个常见的操作,PyTorch中有官方实现,即nn.PixelShuffle(upscale_factor) 该类的作用就是将一个tensor中的元素值进行重排列,假设tensor维度为[B, C, H, W], PixelShuffle操作不仅可以改变tensor的通道数,也会改变特征图的大小。
四、InternVL 部署微调实践
我们选定的任务是让InternVL-2B生成文生图提示词,这个任务需要VLM对图片有格式化的描述并输出。
让我们来一起完成一个用VLM模型进行冷笑话生成,让你的模型说出很逗的冷笑话吧。在这里,我们微调InterenVL使用xtuner。部署InternVL使用lmdeploy。
4.1 准备InternVL模型
我们使用InternVL2-2B模型。该模型已在share文件夹下挂载好,现在让我们把移动出来。
cd /root
mkdir -p model
cp 模型
cp -r /root/share/new_models/OpenGVLab/InternVL2-2B /root/model/
4.2 准备环境
这里我们来手动配置下xtuner。
- 配置虚拟环境
conda create --name xtuner python=3.10 -y
# 激活虚拟环境(注意:后续的所有操作都需要在这个虚拟环境中进行)
conda activate xtuner
# 安装一些必要的库
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# 安装其他依赖
apt install libaio-dev
pip install transformers==4.39.3
pip install streamlit==1.36.0
- 安装xtuner
# 创建一个目录,用来存放源代码
mkdir -p /root/InternLM/code
cd /root/InternLM/code
git clone -b v0.1.23 https://github.com/InternLM/XTuner
进入XTuner目录
cd /root/InternLM/code/XTuner
pip install -e '.[deepspeed]'
- 安装LMDeploy
pip install lmdeploy==0.5.3
- 安装验证
xtuner version
xtuner help
确认一下你的版本号和我们一致哦~
4.3 准备微调数据集
我们这里使用huggingface上的zhongshsh/CLoT-Oogiri-GO据集,特别鸣谢~。
@misc{zhong2023clot,
title={Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation},
author={Zhong, Shanshan and Huang, Zhongzhan and Gao, Shanghua and Wen, Weushao and Lin, Liang and Zitnik, Marinka and Zhou, Pan},
journal={arXiv preprint arXiv:2312.02439},
year={2023}
}
数据集我们从官网下载下来并进行去重,只保留中文数据等操作。并制作成XTuner需要的形式。并已在share里,我们一起从share里挪出数据集。
## 首先让我们安装一下需要的包
pip install datasets matplotlib Pillow timm
## 让我们把数据集挪出来
cp -r /root/share/new_models/datasets/CLoT_cn_2000 /root/InternLM/datasets/
让我们打开数据集的一张图看看,我们选择jsonl里的第一条数据对应的图片。首先我们先把这张图片挪动到InternLM文件夹下面。
cp InternLM/datasets/CLoT_cn_2000/ex_images/007aPnLRgy1hb39z0im50j30ci0el0wm.jpg InternLM/
哈哈,是两只猫在掐架。那我给到的冷笑话回复是什么呢?
4.4 InternVL 推理部署攻略
我们用LMDeploy来推理这张图片~看看它能不能成功解释出梗图呢?
使用pipeline进行推理
之后我们使用lmdeploy自带的pipeline工具进行开箱即用的推理流程,首先我们新建一个文件。
touch /root/InternLM/code/test_lmdeploy.py
cd /root/InternLM/code/
然后把以下代码拷贝进test_lmdeploy.py中。
from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('/root/model/InternVL2-2B')
image = load_image('/root/InternLM/007aPnLRgy1hb39z0im50j30ci0el0wm.jpg')
response = pipe(('请你根据这张图片,讲一个脑洞大开的梗', image))
print(response.text)
运行执行推理结果。
python3 test_lmdeploy.py
推理后
[WARNING] gemm_config.in is not found; using default GEMM algo
这张图片中的猫咪看起来非常可爱和调皮,它穿着一件“衬衫”,像是在模仿人类的装扮。这个场景让人联想到一些“衬衫猫咪”的梗,即猫咪穿着人类的服装,表现出一种反差萌。
这种“衬衫猫咪”梗源自于网络上的一种幽默表达方式,用来形容那些看起来非常可爱,但实际上却有些“不正经”的行为。这种“不正经”的表现往往让人忍俊不禁,因为猫咪的行为往往显得很无辜和无辜,就像穿着一件不合身的衣服一样。
具体来说,这张图片中的猫咪穿着一件“衬衫”,看起来像是在模仿人类的装扮,尤其是衬衫领口的设计和衣领的样式。这种装扮不仅显得它像人类一样,而且也突显了它的可爱和无辜,就像一个“天真”的小孩子一样。
这种“衬衫猫咪”的梗在网络上非常流行,经常被用来形容那些行为可爱但有些“不正经”的猫咪。无论是作为搞笑素材,还是作为猫咪装扮的参考,这种“衬衫猫咪”都深受大家的喜爱。
推理后我们发现直接使用2b模型不能很好的讲出梗,现在我们要对这个2b模型进行微调。
4.5 InternVL 微调攻略
准备数据集
数据集格式为:
# 为了高效训练,请确保数据格式为:
{
"id": "000000033471",
"image": ["coco/train2017/000000033471.jpg"], # 如果是纯文本,则该字段为 None 或者不存在
"conversations": [
{
"from": "human",
"value": "<image>\nWhat are the colors of the bus in the image?"
},
{
"from": "gpt",
"value": "The bus in the image is white and red."
}
]
}
这里我们也为大家准备好了可以直接进行微调的数据集。数据集就是咱们刚才复制进InternLM/datasets的数据。
(一)配置微调参数
让我们一起修改XTuner下 InternVL的config,文件在: /root/InternLM/code/XTuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_qlora_finetune.py
首先我们先对微调config进行介绍:
- setting里是定义模型基本参数的
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
# 模型地址
path = '/root/model/InternVL2-2B'
# Data
# 数据地址
data_root = '/root/data/'
# data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
data_path = '/root/data/screenshot_od/layout_ocr_multi.json'
image_folder = data_root + 'screenshot_od/images'
prompt_template = PROMPT_TEMPLATE.internlm2_chat
# 模型最大输出长度
max_length = 8192
# Scheduler & Optimizer
#每张卡上的batch size大小
batch_size = 8 # per_device
# 梯度累积大小
accumulative_counts = 2
# dataloader数量
dataloader_num_workers = 4
# epoch大小
max_epochs = 1
# 优化器类型
optim_type = AdamW
# official 1024 -> 4e-5
lr = 1e-6
betas = (0.9, 0.999)
weight_decay = 0.05
max_norm = 1 # grad clip
warmup_ratio = 0.03
# Save
save_steps = 1000
save_total_limit = 1 # Maximum checkpoints to keep (-1 means unlimited)
- 模型,tokenizer数据等定义
#######################################################################
# PART 2 Model & Tokenizer & Image Processor #
#######################################################################
model = dict(
type=InternVL_V1_5,
model_path=path,
freeze_llm=True,
freeze_visual_encoder=True,
quantization_llm=True, # or False
quantization_vit=False, # or True and uncomment visual_encoder_lora
# comment the following lines if you don't want to use Lora in llm
llm_lora=dict(
type=LoraConfig,
r=128,
lora_alpha=256,
lora_dropout=0.05,
target_modules=None,
task_type='CAUSAL_LM'),
# uncomment the following lines if you don't want to use Lora in visual encoder # noqa
# visual_encoder_lora=dict(
# type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.05,
# target_modules=['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2'])
)
#######################################################################
# PART 3 Dataset & Dataloader #
#######################################################################
llava_dataset = dict(
type=InternVL_V1_5_Dataset,
model_path=path,
data_paths=data_path,
image_folders=image_folder,
template=prompt_template,
max_length=max_length)
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=llava_dataset,
sampler=dict(
type=LengthGroupedSampler,
length_property='modality_length',
per_device_batch_size=batch_size * accumulative_counts),
collate_fn=dict(type=default_collate_fn))
- 调度,优化器等定义
#######################################################################
# PART 4 Scheduler & Optimizer #
#######################################################################
# optimizer
optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
param_scheduler = [
dict(
type=LinearLR,
start_factor=1e-5,
by_epoch=True,
begin=0,
end=warmup_ratio * max_epochs,
convert_to_iter_based=True),
dict(
type=CosineAnnealingLR,
eta_min=0.0,
by_epoch=True,
begin=warmup_ratio * max_epochs,
end=max_epochs,
convert_to_iter_based=True)
]
# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
#######################################################################
# PART 5 Runtime #
#######################################################################
# Log the dialogue periodically during the training process, optional
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=path,
trust_remote_code=True)
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
]
# configure default hooks
default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 10 iterations.
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per `save_steps`.
checkpoint=dict(
type=CheckpointHook,
save_optimizer=False,
by_epoch=False,
interval=save_steps,
max_keep_ckpts=save_total_limit),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)
# configure environment
env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)
- 需要修改的部分
最基础修改一下模型地址和数据地址即可。
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
path = '/root/model/InternVL2-2B'
# Data
#data_root = './data/llava_data/'
#data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
#image_folder = data_root + 'llava_images'
#prompt_template = PROMPT_TEMPLATE.internlm2_chat
#max_length = 8192
data_root = '/root/InternLM/datasets/CLoT_cn_2000/'
data_path = data_root + 'ex_cn.json'
image_folder = data_root
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 6656
# Scheduler & Optimizer
batch_size = 2 # per_device
- 总体config文件(复制即可)
# Copyright (c) OpenMMLab. All rights reserved.
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import AutoTokenizer
from xtuner.dataset import InternVL_V1_5_Dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.samplers import LengthGroupedSampler
from xtuner.engine.hooks import DatasetInfoHook
from xtuner.engine.runner import TrainLoop
from xtuner.model import InternVL_V1_5
from xtuner.utils import PROMPT_TEMPLATE
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
path = '/root/model/InternVL2-2B'
# Data
data_root = '/root/InternLM/datasets/CLoT_cn_2000/'
data_path = data_root + 'ex_cn.json'
image_folder = data_root
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 6656
# Scheduler & Optimizer
batch_size = 4 # per_device
accumulative_counts = 4
dataloader_num_workers = 4
max_epochs = 1
optim_type = AdamW
# official 1024 -> 4e-5
lr = 2e-5
betas = (0.9, 0.999)
weight_decay = 0.05
max_norm = 1 # grad clip
warmup_ratio = 0.03
# Save
save_steps = 1000
save_total_limit = 1 # Maximum checkpoints to keep (-1 means unlimited)
#######################################################################
# PART 2 Model & Tokenizer & Image Processor #
#######################################################################
model = dict(
type=InternVL_V1_5,
model_path=path,
freeze_llm=True,
freeze_visual_encoder=True,
quantization_llm=True, # or False
quantization_vit=False, # or True and uncomment visual_encoder_lora
# comment the following lines if you don't want to use Lora in llm
llm_lora=dict(
type=LoraConfig,
r=128,
lora_alpha=256,
lora_dropout=0.05,
target_modules=None,
task_type='CAUSAL_LM'),
# uncomment the following lines if you don't want to use Lora in visual encoder # noqa
# visual_encoder_lora=dict(
# type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.05,
# target_modules=['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2'])
)
#######################################################################
# PART 3 Dataset & Dataloader #
#######################################################################
llava_dataset = dict(
type=InternVL_V1_5_Dataset,
model_path=path,
data_paths=data_path,
image_folders=image_folder,
template=prompt_template,
max_length=max_length)
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=llava_dataset,
sampler=dict(
type=LengthGroupedSampler,
length_property='modality_length',
per_device_batch_size=batch_size * accumulative_counts),
collate_fn=dict(type=default_collate_fn))
#######################################################################
# PART 4 Scheduler & Optimizer #
#######################################################################
# optimizer
optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
param_scheduler = [
dict(
type=LinearLR,
start_factor=1e-5,
by_epoch=True,
begin=0,
end=warmup_ratio * max_epochs,
convert_to_iter_based=True),
dict(
type=CosineAnnealingLR,
eta_min=0.0,
by_epoch=True,
begin=warmup_ratio * max_epochs,
end=max_epochs,
convert_to_iter_based=True)
]
# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
#######################################################################
# PART 5 Runtime #
#######################################################################
# Log the dialogue periodically during the training process, optional
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=path,
trust_remote_code=True)
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
]
# configure default hooks
default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 10 iterations.
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per `save_steps`.
checkpoint=dict(
type=CheckpointHook,
save_optimizer=False,
by_epoch=False,
interval=save_steps,
max_keep_ckpts=save_total_limit),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)
# configure environment
env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)
# set visualizer
visualizer = None
# set log level
log_level = 'INFO'
# load from which checkpoint
load_from = None
# whether to resume training from the loaded checkpoint
resume = False
# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)
# set log processor
log_processor = dict(by_epoch=False)
(二)开始训练
这里使用之前搞好的configs进行训练。咱们要调整一下batch size,并且使用qlora。要不半卡不够用的 QAQ。
NPROC_PER_NODE=1 xtuner train /root/InternLM/code/XTuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_qlora_finetune.py --work-dir /root/InternLM/work_dir/internvl_ft_run_8_filter --deepspeed deepspeed_zero1
用默认的batch_size=4,还是OutOfMemoryError了,改成2后正常运行。
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.71 GiB. GPU
(xtuner0121) root@intern-studio-50211982:~/InternLM/code/XTuner#
batch比较小,在A100*30%的机器上,总共花费了11个小时才完成微调。
...
请你根据这张图片,讲一个脑洞大开的梗<|im_end|><|im_start|> assistant
果然!大家都会把鼻屎抹在课桌下面<|im_end|>
08/21 17:44:42 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
08/21 17:44:42 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
08/21 17:44:42 - mmengine - INFO - Checkpoints will be saved to /root/InternLM/work_dir/internvl_ft_run_8_filter.
/root/.conda/envs/xtuner0121/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/root/.conda/envs/xtuner0121/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3417
08/21 17:45:09 - mmengine - INFO - Iter(train) [ 10/6000] lr: 1.0058e-06 eta: 4:31:27 time: 2.7191 data_time: 0.0114 memory: 14652 loss: 5.1294
08/21 17:45:33 - mmengine - INFO - Iter(train) [ 20/6000] lr: 2.1231e-06 eta: 4:12:06 time: 2.3398 data_time: 0.0235 memory: 14651 loss: 5.7605
08/21 17:46:02 - mmengine - INFO - Iter(train) [ 30/6000] lr: 3.2404e-06 eta: 4:24:06 time: 2.9042 data_time: 0.0163 memory: 14646 loss: 5.9527
08/21 17:46:32 - mmengine - INFO - Iter(train) [ 40/6000] lr: 4.3577e-06 eta: 4:31:54 time: 2.9860 data_time: 0.0172 memory: 14632 loss: 6.1117
08/21 17:47:01 - mmengine - INFO - Iter(train) [ 50/6000] lr: 5.4750e-06 eta: 4:34:24 time: 2.8867 data_time: 0.0166 memory: 14593 loss: 5.4724
08/21 17:47:25 - mmengine - INFO - Iter(train) [ 60/6000] lr: 6.5923e-06 eta: 4:28:36 time: 2.4438 data_time: 0.0131 memory: 14632 loss: 5.4773
08/21 17:48:03 - mmengine - INFO - Iter(train) [ 70/6000] lr: 7.7096e-06 eta: 4:43:12 time: 3.7792 data_time: 0.0168 memory: 14630 loss: 5.7110
08/21 17:48:36 - mmengine - INFO - Iter(train) [ 80/6000] lr: 8.8269e-06 eta: 4:48:23 time: 3.3245 data_time: 0.0153 memory: 14637 loss: 5.7963
08/21 17:49:27 - mmengine - INFO - Iter(train) [ 90/6000] lr: 9.9442e-06 eta: 5:11:12 time: 5.0525 data_time: 0.0162 memory: 14625 loss: 5.3584
08/21 17:50:24 - mmengine - INFO - Iter(train) [ 100/6000] lr: 1.1062e-05 eta: 5:36:22 time: 5.7718 data_time: 0.0193 memory: 14622 loss: 5.1775
...
dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3406
08/21 23:49:43 - mmengine - INFO - Iter(train) [3410/6000] lr: 8.2866e-06 eta: 4:37:14 time: 6.1520 data_time: 0.0189 memory: 14672 loss: 2.7470
08/21 23:50:49 - mmengine - INFO - Iter(train) [3420/6000] lr: 8.2334e-06 eta: 4:36:11 time: 6.6438 data_time: 0.0154 memory: 14648 loss: 2.4156
08/21 23:52:01 - mmengine - INFO - Iter(train) [3430/6000] lr: 8.1803e-06 eta: 4:35:12 time: 7.1605 data_time: 0.0180 memory: 14577 loss: 2.0693
08/21 23:53:12 - mmengine - INFO - Iter(train) [3440/6000] lr: 8.1272e-06 eta: 4:34:13 time: 7.0844 data_time: 0.0171 memory: 14624 loss: 1.7735
08/21 23:54:29 - mmengine - INFO - Iter(train) [3450/6000] lr: 8.0743e-06 eta: 4:33:18 time: 7.7216 data_time: 0.0191 memory: 14630 loss: 1.8995
08/21 23:55:41 - mmengine - INFO - Iter(train) [3460/6000] lr: 8.0213e-06 eta: 4:32:20 time: 7.2184 data_time: 0.0177 memory: 14631 loss: 1.7147
08/21 23:56:46 - mmengine - INFO - Iter(train) [3470/6000] lr: 7.9684e-06 eta: 4:31:16 time: 6.4690 data_time: 0.0154 memory: 14636 loss: 1.8168
08/21 23:57:51 - mmengine - INFO - Iter(train) [3480/6000] lr: 7.9156e-06 eta: 4:30:12 time: 6.5319 data_time: 0.0155 memory: 14644 loss: 1.8507
08/21 23:59:01 - mmengine - INFO - Iter(train) [3490/6000] lr: 7.8628e-06 eta: 4:29:12 time: 7.0294 data_time: 0.0159 memory: 14624 loss: 1.5084
08/22 00:00:07 - mmengine - INFO - Iter(train) [3500/6000] lr: 7.8101e-06 eta: 4:28:08 time: 6.5171 data_time: 0.0157 memory: 14620 loss: 1.7641
...
2024/08/22 04:23:17 - mmengine - INFO - Iter(train) [5890/6000] lr: 1.7945e-08 eta: 0:11:55 time: 6.6322 data_time: 0.0162 memory: 14624 loss: 0.4356
2024/08/22 04:24:23 - mmengine - INFO - Iter(train) [5900/6000] lr: 1.4858e-08 eta: 0:10:50 time: 6.5471 data_time: 0.0143 memory: 14627 loss: 0.4764
2024/08/22 04:25:28 - mmengine - INFO - Iter(train) [5910/6000] lr: 1.2062e-08 eta: 0:09:45 time: 6.5247 data_time: 0.0167 memory: 14621 loss: 0.5624
2024/08/22 04:26:22 - mmengine - INFO - Iter(train) [5920/6000] lr: 9.5571e-09 eta: 0:08:40 time: 5.4022 data_time: 0.0142 memory: 14620 loss: 0.6877
2024/08/22 04:27:22 - mmengine - INFO - Iter(train) [5930/6000] lr: 7.3432e-09 eta: 0:07:35 time: 5.9815 data_time: 0.0137 memory: 14614 loss: 0.4565
2024/08/22 04:28:29 - mmengine - INFO - Iter(train) [5940/6000] lr: 5.4206e-09 eta: 0:06:30 time: 6.7140 data_time: 0.0140 memory: 14618 loss: 0.5041
2024/08/22 04:29:32 - mmengine - INFO - Iter(train) [5950/6000] lr: 3.7891e-09 eta: 0:05:25 time: 6.2728 data_time: 0.0144 memory: 14607 loss: 0.4104
2024/08/22 04:30:32 - mmengine - INFO - Iter(train) [5960/6000] lr: 2.4489e-09 eta: 0:04:20 time: 6.0393 data_time: 0.0131 memory: 14615 loss: 0.3896
2024/08/22 04:31:35 - mmengine - INFO - Iter(train) [5970/6000] lr: 1.4000e-09 eta: 0:03:15 time: 6.3093 data_time: 0.0140 memory: 14615 loss: 0.2181
2024/08/22 04:32:46 - mmengine - INFO - Iter(train) [5980/6000] lr: 6.4248e-10 eta: 0:02:10 time: 7.0711 data_time: 0.0175 memory: 14614 loss: 0.2556
2024/08/22 04:33:50 - mmengine - INFO - Iter(train) [5990/6000] lr: 1.7628e-10 eta: 0:01:05 time: 6.4128 data_time: 0.0141 memory: 14617 loss: 0.3989
2024/08/22 04:34:57 - mmengine - INFO - Exp name: internvl_v2_internlm2_2b_qlora_finetune_20240821_174406
2024/08/22 04:34:57 - mmengine - INFO - Iter(train) [6000/6000] lr: 1.4569e-12 eta: 0:00:00 time: 6.6742 data_time: 0.0152 memory: 14608 loss: 0.1039
(三)合并权重&&模型转换
用官方脚本进行权重合并
python3 xtuner/configs/internvl/v1_5/convert_to_official.py xtuner/configs/internvl/v2/internvl_v2_internlm2_5_8b_qlora_finetune.py /root/InternLM/work_dir/internvl_ft_run_8_filter/iter_6000.pth /root/InternLM/InternVL2-2B/
需要安装flash_attn,如果出现卡住的情况,需要先安装ninja
pip install ninja
pip install flash_attn
最后我们的模型合并后,在:/root/InternLM/convert_model/,文件格式:
tree /root/InternLM/InternVL2-2B
/root/InternLM/InternVL2-2B
├── added_tokens.json
├── config.json
├── configuration_intern_vit.py
├── configuration_internlm2.py
├── configuration_internvl_chat.py
├── conversation.py
├── generation_config.json
├── model.safetensors
├── modeling_intern_vit.py
├── modeling_internlm2.py
├── modeling_internvl_chat.py
├── special_tokens_map.json
├── tokenization_internlm2.py
├── tokenizer.model
└── tokenizer_config.json
比较一下大小,只不过增加了59K。
du /root/InternLM/InternVL2-2B /root/model/InternVL2-2B -h -k
4309789 /root/InternLM/InternVL2-2B
2029 /root/model/InternVL2-2B/examples
4311877 /root/model/InternVL2-2B
4.6 微调后效果对比
现在我们微调好啦,让我们重新再来试试这张小猫图片吧!
我们模型替换一下,然后跑一下效果。
from lmdeploy import pipeline
from lmdeploy.vl import load_image
#pipe = pipeline('/root/model/InternVL2-2B')
pipe = pipeline('/root/InternLM/InternVL2-2B')
image = load_image('/root/InternLM/256321723775630_.pic.jpg')
for i in range(20):
response = pipe(('请你根据这张图片,讲一个脑洞大开的梗', image))
print(response.text)
微调后的模型结果果然有趣多了
(xtuner0121) root@intern-studio-50211982:~/InternLM/code# python ./test_lmdeploy.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo
被穿了外套
被粘住了无法逃脱的猫猫
被子和被子同时醒来
“哥哥,你别闹了!我刚刚只是迷路而已!”
被强行拉去体验打蚊子大赛
被粘住了,别过来!
被猫爪抓伤了
已经不是第一次了,果然还是不行啊!
说拜拜,稍等,我马上就到!
被强行穿上衣服后不肯脱下来的猫
被粘住了,爬不下来了
我因为上半身是虎的,所以就当成了虎哥
被猫欺负的兔子回家后吓唬父母
刚才不是已经解了印子月的印吗?
我因为指出打工好累所以叫你别闹了,没想到你真的以为我在闹?
对,非常好吃,真的非常好吃,我感动得热泪盈眶。
被穿了裙子的猫
我告诉你,别试图去拉扯我的头,会有点可怕的故事发生。
说要拜师学艺被当成愣头青教手艺人一直在教主做俯卧撑
刚才不是说了我不是阿婆吗!
(xtuner0121) root@intern-studio-50211982:~/InternLM/code# python ./test_lmdeploy.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING] gemm_config.in is not found; using default GEMM algo
被男朋友脱下了衣服
被逼着去揭发出轨的哥哥
被猫爪抓伤了,好可怕啊~
被强行穿上衣服后不肯脱的猫
被猫爪抓伤后,拼命擦去血渍说:“我一点也没擦伤,你给我闭嘴!”
被猫爪抓伤的肿了的位置
被子和被子混在一起打滚
我因为胸口好痛所以叫了!
被粘住无法逃脱的倒霉猫
被猫爪抓伤的肿起部分
被粘住了无法逃脱的猫
刚才不是说了我是第一名吗!
说:“我养的猫这么可爱,你怎么还不要呢?”
被猫爪抓伤的胳膊怎么还不上火
被强行换上新手服后猛地意识到是新手的猫
被子和被子一起被尿了
被猫爪抓伤了
被猫爪抓伤的腋下
被猫爪抓伤了
已经不是第一次了,果然还是没忍住~