当前位置: 首页 > wzjs >正文

佛山做外贸网站的企业网络营销方案

佛山做外贸网站的,企业网络营销方案,网站设计开发人员,政府网站建设意义MNN 支持 DeepSeekVL 技术文档 DeepSeekVL (https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/) 是 DeepSeek 开发的多模态大语言模型。7b 模型可以基于 MNN (https://github.com/alibaba/MNN/)在高端手机上运行,因此进行了一下适配。 一、模型结构…

MNN 支持 DeepSeekVL 技术文档

DeepSeekVL (https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/) 是 DeepSeek 开发的多模态大语言模型。7b 模型可以基于 MNN (https://github.com/alibaba/MNN/)在高端手机上运行,因此进行了一下适配。

一、模型结构分析

先按官方文档进行环境配置

git clone https://github.com/deepseek-ai/DeepSeek-VL
cd DeepSeek-VLpip install -e .

然后修改一下运行脚本(只需要把 cuda() 去掉,以支持在没有cuda的设备上运行):

import torch
from modelscope import AutoModelForCausalLMfrom deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
from deepseek_vl.utils.io import load_pil_images# specify the path to the model
model_path = "deepseek-ai/deepseek-vl-7b-chat"
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
print(vl_chat_processor)vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).eval()
print(vl_gpt)conversation = [{"role": "User","content": "<image_placeholder>Describe each stage of this image.","images": ["./images/training_pipelines.png"]},{"role": "Assistant","content": ""}
]# load images and prepare for inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(conversations=conversation,images=pil_images,force_batchify=True
).to(vl_gpt.device)# run image encoder to get the image embeddings
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)# run the model to get the response
outputs = vl_gpt.language_model.generate(inputs_embeds=inputs_embeds,attention_mask=prepare_inputs.attention_mask,pad_token_id=tokenizer.eos_token_id,bos_token_id=tokenizer.bos_token_id,eos_token_id=tokenizer.eos_token_id,max_new_tokens=512,do_sample=False,use_cache=True
)answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)

代码中的 vl_gpt 即模型结构,打印结果如下:

MultiModalityCausalLM((vision_model): HybridVisionTower((vision_tower_high): CLIPVisionTower((vision_tower): ImageEncoderViT((patch_embed): PatchEmbed((proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16)))(blocks): ModuleList((0-11): 12 x Block((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=768, out_features=2304, bias=True)(proj): Linear(in_features=768, out_features=768, bias=True))(norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)(mlp): MLPBlock((lin1): Linear(in_features=768, out_features=3072, bias=True)(lin2): Linear(in_features=3072, out_features=768, bias=True)(act): GELU(approximate='none'))))(neck): Sequential((0): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(1): LayerNorm2d()(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(3): LayerNorm2d())(downsamples): Sequential((0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False))(neck_hd): Sequential((0): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(1): LayerNorm2d()(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(3): LayerNorm2d()))(image_norm): Normalize(mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711]))(vision_tower_low): CLIPVisionTower((vision_tower): VisionTransformer((patch_embed): PatchEmbed((proj): Conv2d(3, 1024, kernel_size=(16, 16), stride=(16, 16))(norm): Identity())(pos_drop): Dropout(p=0.0, inplace=False)(patch_drop): Identity()(norm_pre): Identity()(blocks): Sequential((0): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(1): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(2): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(3): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(4): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(5): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(6): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(7): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(8): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(9): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(10): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(11): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(12): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(13): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(14): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(15): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(16): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(17): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(18): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(19): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(20): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(21): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(22): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(23): Block((norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1024, out_features=3072, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity()))(norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(attn_pool): AttentionPoolLatent((q): Linear(in_features=1024, out_features=1024, bias=True)(kv): Linear(in_features=1024, out_features=2048, bias=True)(q_norm): Identity()(k_norm): Identity()(proj): Linear(in_features=1024, out_features=1024, bias=True)(proj_drop): Dropout(p=0.0, inplace=False)(norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1024, out_features=4096, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4096, out_features=1024, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(fc_norm): Identity()(head_drop): Dropout(p=0.0, inplace=False)(head): Identity())(image_norm): Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]))(high_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(low_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(resize): Resize(size=384, interpolation=bilinear, max_size=None, antialias=True))(aligner): MlpProjector((high_up_proj): Linear(in_features=1024, out_features=2048, bias=True)(low_up_proj): Linear(in_features=1024, out_features=2048, bias=True)(layers): Sequential((0): GELU(approximate='none')(1): Linear(in_features=4096, out_features=4096, bias=True)))(language_model): LlamaForCausalLM((model): LlamaModel((embed_tokens): Embedding(102400, 4096)(layers): ModuleList((0-29): 30 x LlamaDecoderLayer((self_attn): LlamaAttention((q_proj): Linear(in_features=4096, out_features=4096, bias=False)(k_proj): Linear(in_features=4096, out_features=4096, bias=False)(v_proj): Linear(in_features=4096, out_features=4096, bias=False)(o_proj): Linear(in_features=4096, out_features=4096, bias=False))(mlp): LlamaMLP((gate_proj): Linear(in_features=4096, out_features=11008, bias=False)(up_proj): Linear(in_features=4096, out_features=11008, bias=False)(down_proj): Linear(in_features=11008, out_features=4096, bias=False)(act_fn): SiLU())(input_layernorm): LlamaRMSNorm((4096,), eps=1e-06)(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-06)))(norm): LlamaRMSNorm((4096,), eps=1e-06)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features=4096, out_features=102400, bias=False))
)

它由以下核心组件构成:

  • 视觉模型(Vision Model): HybridVisionTower,包含 ViT(Vision Transformer)和 ResNet 混合架构,用于提取图像特征,类变量名为 vision_model
  • 对齐层(Aligner): MlpProjector,将视觉模型输出的特征映射到语言模型的嵌入空间,类变量名为aligner
  • 语言模型(Language Model): 处理文本和多模态上下文,类变量名为language_model

二、大语言模型部分导出

1. 加载

修改 transformers/llm/export/llm_export.py ,对于 deepseek-vl模型,采用单独的加载代码。由于默认的model_typemulti_modality ,不具可区分性,修改为 deepseek-vl

elif 'deepseek-vl' in model_path:from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLMvl_chat_processor = VLChatProcessor.from_pretrained(model_path)self.tokenizer = vl_chat_processor.tokenizerself.processor = vl_chat_processorvl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).eval()self.model = vl_gptself.model.config.model_type = "deepseek-vl"

2. 映射

根据模型结构,在 transformers/llm/export/utils/model_mapper.py 中增加 deepseek-vl的映射表:

    def regist_deepseek_vl(self):deepseek_vlmap = {'config': {'hidden_size': 'language_config.hidden_size','num_attention_heads': 'language_config.num_attention_heads','num_hidden_layers': 'language_config.num_hidden_layers','rope_theta': 'language_config.rope_theta','head_dim': 'language_config.head_dim','num_key_value_heads': 'language_config.num_key_value_heads',},'model': {'lm_': 'language_model.lm_head','embed_': 'language_model.model.embed_tokens','blocks_': 'language_model.model.layers','final_layernorm_': 'language_model.model.norm',# 'visual': 'vision_model'},'decoder': {'self_attn': 'self_attn','mlp': 'mlp','input_layernorm': 'input_layernorm','post_attention_layernorm': 'post_attention_layernorm'},'attention': {'q_proj': 'q_proj','k_proj': 'k_proj','v_proj': 'v_proj','o_proj': 'o_proj'}}    self.regist('deepseek-vl', deepseek_vlmap)

*** 先关闭 visual 模型的导出 ***

3. 模板

deepseek 的语言模型有自己的特殊模板,在 transformers/llm/export/llm_export.pybuild_prompt_template 函数定义:

if 'DeepSeek' or 'deepseek' in self.args.path:template['bos'] = '<|begin_of_sentence|>'template['system'] = '{content}\n'template['user'] = '\nUser: {content}\n'template['assistant'] = '\nAssistant: {content}\n<|end_of_sentence|>'

4. 导出与测试

执行 python llmexport.py --path deepseek-vl --export mnn 即可导出大语言模型,文本对话简单测试后无误。

三、图像模块导出

1. 处理流程分析

pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(conversations=conversation, images=pil_images, force_batchify=True
)
# ......
VLChatProcessor::process_oneimages_outputs = self.image_processor(images, return_tensors="pt")

阅读代码可知图像处理分两步:

  • 预处理:VLMImageProcessor ,做缩放和 1.0 / 255.0 的数值变化。
  • 计算 embedding: MultiModalityCausalLM
    • images = rearrange(pixel_values, “b n c h w -> (b n) c h w”)
    • images_embeds = self.aligner(self.vision_model(images))

分别打印 pixel_valuesimages_embeds 的 shape :

        print(pixel_values.shape)images = rearrange(pixel_values, "b n c h w -> (b n) c h w")print(images.shape)# [b x n, T2, D]images_embeds = self.aligner(self.vision_model(images))print(images_embeds.shape)

结果如下:

torch.Size([1, 1, 3, 1024, 1024])
torch.Size([1, 3, 1024, 1024])
torch.Size([1, 576, 4096])

结论:

  • rearrange 这步不用做,MNN LLM 目前不支持 batch 图像输入
  • 需要两个类: alignervision_model
  • embedding 的维度是 batch, seq_len, hidden_size ,需要转置为 seq_len, batch, hidden_size

2. 图像模型导出

  • 修改 model_mapper.py ,加上 visual 模型的映射

  • transformers/llm/export/vision.py 增加 deepseek-vl 对应的 vision 类:

class DeepSeekVL(Vision):def __init__(self, visual, base):super().__init__(visual, base)self.quant_bit = 8self.aligner = base.model.alignerself.vision_model = visualdef load(self):self.image_size = 1024self.llm_config['is_visual'] = Trueself.llm_config['image_size'] = self.image_size# self.llm_config['vision_start'] = self.tokenizer.img_start_id# self.llm_config['vision_end'] = self.tokenizer.img_end_id# self.llm_config['image_pad'] = self.tokenizer.img_pad_iddef init_config(self):self.llm_config['is_visual'] = TrueIMAGENET_MEAN = [0.0, 0.0, 0.0]IMAGENET_STD = [1.0,1.0,1.0]for i in range(3):IMAGENET_MEAN[i] = IMAGENET_MEAN[i] * 255.0IMAGENET_STD[i] = 1.0 / IMAGENET_STD[i] / 255.0self.llm_config['image_mean'] = IMAGENET_MEANself.llm_config['image_norm'] = IMAGENET_STDself.llm_config['image_size_unit'] = 14def export(self, onnx_path):input_images = torch.randn((1, 3, self.image_size, self.image_size), dtype=torch.float32)onnx_model = f'{onnx_path}/visual.onnx'torch.onnx.export(self, (input_images),onnx_model,input_names=['input_images'],output_names=['image_embeds'],dynamic_axes={"input_images": { 0: "size", 2: "height", 3: "width"},},do_constant_folding=True,verbose=False,opset_version=15)return onnx_modeldef forward(self, images):vit_embeds = self.aligner(self.vision_model(images))# For mnn's embedding, the order is (seq, batch, hidden)vit_embeds = vit_embeds.permute(1, 0, 2)return vit_embeds

ONNX 导出问题与解决

在导出视觉模型到 ONNX 时,遇到不支持的算子:

torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::_upsample_bilinear2d_aa' to ONNX opset version 15 is not supported.
  • 修改 HybridVisionTower 的 Resize 层
    # 将 antialias=True 改为 False
    self.resize = torchvision.transforms.Resize(self.low_res_size, antialias=False)
    

然后可以完整导出

遗留问题

  • 无论 image 的尺寸是多少,输出的 embedding 尺寸不变,原因是
  • 视觉模型中的固定尺寸插值
    sam.pyneck 模块中,强制将特征图插值到 96x96
    x = F.interpolate(x.float(), size=(96, 96), mode="bilinear", align_corners=False)  # 固定尺寸
    

四、测试

使用如下 prompt 测试:

<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg<hw>1024,1024</hw></img>介绍一下图片里的内容

结果如下:

The device supports: i8sdot:1, fp16:1, i8mm: 0, sve2: 0, sme2: 0
config path is /Users/xtjiang/alicnn/deepseek-vl-7b-chat-MNN/config.json
main, 227, cost time: 2062.623047 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 231, cost time: 1398.634033 ms
prompt file is /Users/xtjiang/alicnn/AliNNPrivate/build/pic2.txt
File has been downloaded successfully.
这张图片捕捉了海滩上的一个温馨时刻。一位女士和她的狗坐在沙滩上,享受着落日的余晖。女士穿着格子衬衫和牛仔裤,坐在沙滩上,双腿交叉。她正在抚摸着坐在她旁边的一只小狗。小狗穿着蓝色的背带,正在回应这种爱抚,并伸出舌头舔女士的脸。太阳正在落山,给海滩上的场景投下了温暖的金色光芒。背景中的海洋平静,为这个亲密的场景增添了宁静的氛围。#################################
prompt tokens num = 592
decode tokens num = 106vision time = 3.35 saudio time = 0.00 s
prefill time = 8.52 sdecode time = 5.64 ssample time = 0.02 s
prefill speed = 69.45 tok/sdecode speed = 18.80 tok/s
##################################
http://www.dtcms.com/wzjs/251729.html

相关文章:

  • 太原做app网站建设惠州网络营销
  • 公司域名网站谷歌海外广告投放
  • 陕西营销型网站建设seo外包推广
  • 做网站制作赚钱吗关键词优化是怎样收费的
  • 东莞市网站建设分站房地产销售
  • 帮境外赌场做网站是否有风险seo引擎优化工具
  • 苏州协会网站建设武汉seo服务
  • 青岛网站制作系统北京百度推广优化排名
  • 南京网站销售百度旗下产品
  • qq登录网站怎么做b2b网站平台有哪些
  • 房地产企业网站建设网店代运营合同
  • wordpress 客户端搜索引擎优化概述
  • 服装网站建设目的网络营销电子版教材
  • jsp网站建设项目实战总结网页制作html代码
  • 所有网站排名2015年西安网站建设公司排名
  • 青岛seo网站排名优化东莞做网站推广
  • 网站建设咨询云尚网络网络营销的未来6个发展趋势
  • phicomm怎么做网站百度网址大全网址导航
  • 有人做几个蝎子养殖门户网站2022年热点营销案例
  • 深圳做的好的电子行业招聘网站长沙专业网站制作
  • 怎么在自己的网站加关键词杭州seo整站优化
  • 网站怎么做背景不变页面滑动seo关键词排名优化报价
  • 烟台网站建设公司报价百度站长工具网站提交
  • 化妆品网站开发可行性做网络推广费用
  • 做网站 服务器多少钱一年南昌seo排名优化
  • 一个网站可以做几个关键词企业网站建设需要多少钱
  • 批发网站建设app制作费用一览表
  • 房地产网站建设流程滕州seo
  • 长宁哪里有做网站优化比较好迅雷磁力链bt磁力天堂
  • 电商网站的活动怎么做seo外链发布技巧