当前位置: 首页 > news >正文

huggingface CLIP 相关模型下载与使用

下载模型到本地

model_name = "google/siglip2-base-patch16-224"
model_path = "models/huggingface/siglip2-base-patch16-224"snapshot_download(repo_id=model_name,local_dir=model_path,local_dir_use_symlinks=False,revision="main",#use_auth_token="<YOUR_ACCESS_TOKEN>",resume_download=True)

image 模型特征提取

第一种,有显式的VisonModel

from transformers import CLIPVisionModel, CLIPImageProcessordef _crop_and_resize_pad(image, height=480, width=720):image = np.array(image)image_height, image_width, _ = image.shapeif image_height / image_width < height / width:pad = int((((height / width) * image_width) - image_height) / 2.)padded_image = np.ones((image_height + pad * 2, image_width, 3), dtype=np.uint8) * 255# padded_image = np.zeros((image_height + pad * 2, image_width, 3), dtype=np.uint8)padded_image[pad:pad+image_height, :] = imageimage = Image.fromarray(padded_image).resize((width, height))else:pad = int((((width / height) * image_height) - image_width) / 2.)padded_image = np.ones((image_height, image_width + pad * 2, 3), dtype=np.uint8) * 255# padded_image = np.zeros((image_height, image_width + pad * 2, 3), dtype=np.uint8)padded_image[:, pad:pad+image_width] = imageimage = Image.fromarray(padded_image).resize((width, height))return imageimage_encoder = CLIPVisionModel.from_pretrained(pipeline_path)
image_processor = CLIPImageProcessor.from_pretrained(pipeline_path)image_encoder = image_encoder.to(torch.float32).to(device)img = np.random.rand( 236, 621, 3 ) * 255
image = Image.fromarray(img.astype(np.uint8))
image = _crop_and_resize_pad( image, height=512, width=512 )
image.save("tmp/res.jpg")
key  = "pixel_values"
image = image_processor(images=image, return_tensors="pt")image = image[key].to(device)
# torch.Size([1, 3, 336, 336])
image_embeds = image_encoder(pixel_values=image, output_hidden_states=True)
res = image_embeds.hidden_states[-2]

第二种,使用Auto 方式

from transformers import AutoProcessor, AutoModel
pipeline_path="models/huggingface/siglip2-large-patch16-512"
model = AutoModel.from_pretrained(pipeline_path)
processor = AutoProcessor.from_pretrained(pipeline_path)key = "pixel_values"
image = processor(images=image, return_tensors="pt")#
#print( image.keys() )
image = image[key] #.to(device)
# torch.Size([1, 3, 336, 336])
print( image.shape )
image_embeds = model.vision_model(pixel_values=image, output_hidden_states=True)for k,v in image_embeds.items():if k == "hidden_states":print(k)[ print(e.shape) for e in v ]else:print(k, v.shape)
res = image_embeds.hidden_states[-2]

打印的信息

# clip-vit-large-patch14-336
last_hidden_state torch.Size([1, 577, 1024])
pooler_output torch.Size([1, 1024])
hidden_states
torch.Size([1, 577, 1024])====-----====
# siglip2-large-patch16-512
last_hidden_state torch.Size([1, 1024, 1024])
pooler_output torch.Size([1, 1024])
hidden_states
torch.Size([1, 1024, 1024])

相关文章:

  • 在 springboot3.x 使用 knife4j 以及常见报错汇总
  • QGis实现geoserver上的样式展示(方便样式编辑)
  • BSRN地表基准辐射网数据批量下载
  • MacOS内存管理-删除冗余系统数据System Data
  • STM32 Modbus RTU从机开发实战:核心实现与五大调试陷阱解析
  • Java并发编程利器:LongAdder原理解析与实战应用
  • Linux系统-基本指令(3)
  • Linux Ubuntu24.04配置安装MySQL8.4.5高可用集群主从复制!
  • Docker修改镜像存放位置
  • influxdb时序数据库
  • 图论学习笔记 5 - 最小树形图
  • 代码随想录算法训练营 Day56 图论Ⅶ 最小生成树算法 Prim Kruskal
  • 仿真环境中机器人抓取与操作 - 上手指南
  • 《软件工程》第 16 章 - 软件项目管理与过程改进
  • OpenCv高阶(十三)——人脸检测
  • 2025年智慧农业与人工智能国际学术会议(SAAI 2025)
  • 微软开源bitnet b1.58大模型,应用效果测评(问答、知识、数学、逻辑、分析)
  • deepseek开源资料汇总
  • 7系fpga带microblaze做固件及固化
  • 攻防世界-ics-07
  • 网站建设的流程/投放广告的网站
  • 佛山网站建设公司排名/seo销售好做吗
  • 网站信息内容建设/seo 页面
  • 网站流量劫持怎么做/百度拍照搜索
  • 如何去做电商/泰州百度seo公司
  • 十堰市网站建设/搜索seo