当前位置: 首页 > news >正文

聊聊Spring AI的Multimodality

本文主要研究一下Spring AI的Multimodality

示例

chatModel示例

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage(
	"Explain what do you see in this picture?", // content
	new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource)); // media

ChatResponse response = chatModel.call(new Prompt(this.userMessage));

chatClient示例

String response = ChatClient.create(chatModel).prompt()
		.user(u -> u.text("Explain what do you see on this picture?")
				    .media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
		.call()
		.content();

目前是如下几种模型支持多模态

  • Anthropic Claude 3
  • AWS Bedrock Converse
  • Azure Open AI (e.g. GPT-4o models)
  • Mistral AI (e.g. Mistral Pixtral models)
  • Ollama (e.g. LLaVA, BakLLaVA, Llama3.2 models)
  • OpenAI (e.g. GPT-4 and GPT-4o models)
  • Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)

源码

UserMessage

org/springframework/ai/chat/messages/UserMessage.java

public class UserMessage extends AbstractMessage implements MediaContent {

	protected final List<Media> media;

	public UserMessage(String textContent) {
		this(MessageType.USER, textContent, new ArrayList<>(), Map.of());
	}

	public UserMessage(Resource resource) {
		super(MessageType.USER, resource, Map.of());
		this.media = new ArrayList<>();
	}

	public UserMessage(String textContent, List<Media> media) {
		this(MessageType.USER, textContent, media, Map.of());
	}

	public UserMessage(String textContent, Media... media) {
		this(textContent, Arrays.asList(media));
	}

	public UserMessage(String textContent, Collection<Media> mediaList, Map<String, Object> metadata) {
		this(MessageType.USER, textContent, mediaList, metadata);
	}

	public UserMessage(MessageType messageType, String textContent, Collection<Media> media,
			Map<String, Object> metadata) {
		super(messageType, textContent, metadata);
		Assert.notNull(media, "media data must not be null");
		this.media = new ArrayList<>(media);
	}

	@Override
	public String toString() {
		return "UserMessage{" + "content='" + getText() + '\'' + ", properties=" + this.metadata + ", messageType="
				+ this.messageType + '}';
	}

	@Override
	public List<Media> getMedia() {
		return this.media;
	}

	@Override
	public String getText() {
		return this.textContent;
	}

}

UserMessage实现了MediaContent的getMedia方法

Media

org/springframework/ai/model/Media.java

public class Media {

	private static final String NAME_PREFIX = "media-";

	/**
	 * An Id of the media object, usually defined when the model returns a reference to
	 * media it has been passed.
	 */
	@Nullable
	private String id;

	private final MimeType mimeType;

	private final Object data;

	/**
	 * The name of the media object that can be referenced by the AI model.
	 * <p>
	 * Important security note: This field is vulnerable to prompt injections, as the
	 * model might inadvertently interpret it as instructions. It is recommended to
	 * specify neutral names.
	 *
	 * <p>
	 * The name must only contain:
	 * <ul>
	 * <li>Alphanumeric characters
	 * <li>Whitespace characters (no more than one in a row)
	 * <li>Hyphens
	 * <li>Parentheses
	 * <li>Square brackets
	 * </ul>
	 */
	private String name;

	//......
}	

Media定义了id、mimeType、data、name属性

Format

	public static class Format {

		// -----------------
		// Document formats
		// -----------------
		/**
		 * Public constant mime type for {@code application/pdf}.
		 */
		public static final MimeType DOC_PDF = MimeType.valueOf("application/pdf");

		/**
		 * Public constant mime type for {@code text/csv}.
		 */
		public static final MimeType DOC_CSV = MimeType.valueOf("text/csv");

		/**
		 * Public constant mime type for {@code application/msword}.
		 */
		public static final MimeType DOC_DOC = MimeType.valueOf("application/msword");

		/**
		 * Public constant mime type for
		 * {@code application/vnd.openxmlformats-officedocument.wordprocessingml.document}.
		 */
		public static final MimeType DOC_DOCX = MimeType
			.valueOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document");

		/**
		 * Public constant mime type for {@code application/vnd.ms-excel}.
		 */
		public static final MimeType DOC_XLS = MimeType.valueOf("application/vnd.ms-excel");

		/**
		 * Public constant mime type for
		 * {@code application/vnd.openxmlformats-officedocument.spreadsheetml.sheet}.
		 */
		public static final MimeType DOC_XLSX = MimeType
			.valueOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

		/**
		 * Public constant mime type for {@code text/html}.
		 */
		public static final MimeType DOC_HTML = MimeType.valueOf("text/html");

		/**
		 * Public constant mime type for {@code text/plain}.
		 */
		public static final MimeType DOC_TXT = MimeType.valueOf("text/plain");

		/**
		 * Public constant mime type for {@code text/markdown}.
		 */
		public static final MimeType DOC_MD = MimeType.valueOf("text/markdown");

		// -----------------
		// Video Formats
		// -----------------
		/**
		 * Public constant mime type for {@code video/x-matros}.
		 */
		public static final MimeType VIDEO_MKV = MimeType.valueOf("video/x-matros");

		/**
		 * Public constant mime type for {@code video/quicktime}.
		 */
		public static final MimeType VIDEO_MOV = MimeType.valueOf("video/quicktime");

		/**
		 * Public constant mime type for {@code video/mp4}.
		 */
		public static final MimeType VIDEO_MP4 = MimeType.valueOf("video/mp4");

		/**
		 * Public constant mime type for {@code video/webm}.
		 */
		public static final MimeType VIDEO_WEBM = MimeType.valueOf("video/webm");

		/**
		 * Public constant mime type for {@code video/x-flv}.
		 */
		public static final MimeType VIDEO_FLV = MimeType.valueOf("video/x-flv");

		/**
		 * Public constant mime type for {@code video/mpeg}.
		 */
		public static final MimeType VIDEO_MPEG = MimeType.valueOf("video/mpeg");

		/**
		 * Public constant mime type for {@code video/mpeg}.
		 */
		public static final MimeType VIDEO_MPG = MimeType.valueOf("video/mpeg");

		/**
		 * Public constant mime type for {@code video/x-ms-wmv}.
		 */
		public static final MimeType VIDEO_WMV = MimeType.valueOf("video/x-ms-wmv");

		/**
		 * Public constant mime type for {@code video/3gpp}.
		 */
		public static final MimeType VIDEO_THREE_GP = MimeType.valueOf("video/3gpp");

		// -----------------
		// Image Formats
		// -----------------
		/**
		 * Public constant mime type for {@code image/png}.
		 */
		public static final MimeType IMAGE_PNG = MimeType.valueOf("image/png");

		/**
		 * Public constant mime type for {@code image/jpeg}.
		 */
		public static final MimeType IMAGE_JPEG = MimeType.valueOf("image/jpeg");

		/**
		 * Public constant mime type for {@code image/gif}.
		 */
		public static final MimeType IMAGE_GIF = MimeType.valueOf("image/gif");

		/**
		 * Public constant mime type for {@code image/webp}.
		 */
		public static final MimeType IMAGE_WEBP = MimeType.valueOf("image/webp");

	}

Format定义了常用的几种MimeType

小结

Spring AI设计了各种message类型用于支持多模态,其中UserMessage有个media属性,类型List<Media>,支持传入图像、音频、视频,MimeType用于指定是哪种类型。

doc

  • multimodality

相关文章:

  • 汇编获取二进制
  • 穿梭在数字王国:Python进制转换奇遇记
  • JMeter的关联
  • 日常记录-设置新增pve的ct容器
  • 基于HTML + jQuery + Bootstrap 4实现(Web)地铁票价信息生成系统
  • Java中parallelStream并行流使用指南
  • PostgreSQL有类似oracle的move表吗
  • 哈希表系列一>存在重复元素II 存在重复元素I
  • Honeyview:快速浏览各类图像
  • 宝塔Mysql远程连接记录
  • 电感详解:定义、作用、分类与使用要点
  • 高中生学习数据隐私保护的“技术-制度-文化”协同机制研究
  • 说一说 Spring 中的事务
  • AI——seaborn使用
  • 【位运算】两整数之和
  • Verilog:LED呼吸灯
  • 【android bluetooth 框架分析 02】【Module详解 2】【gd_shim_module 模块介绍】
  • linux 内存踩踏导致的空指针问题分析纪要
  • 湖北建筑安全员C1证考试难度怎么样
  • Linux 第三讲 --- 基础指令(三)
  • 兰州企业网站排名优化/百度搜索关键词排名优化推广
  • 网站建设 橙/网站seo属于什么专业
  • 移动商城网站建设 深圳/武汉推广系统
  • 湖南中虹羽建设工程有限公司网站/竞价排名的优缺点
  • 学子网站建设/友链交换有什么作用
  • 克隆网站带后台/百度网站免费优化软件下载