当前位置: 首页 > news >正文

聊聊Spring AI的PgVectorStore

本文主要研究一下Spring AI的PgVectorStore

示例

pom.xml

		<dependency>
			<groupId>org.springframework.ai</groupId>
			<artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
		</dependency>

pgvector

docker run -it --rm --name postgres -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres pgvector/pgvector:pg16

配置

spring:
  datasource:
    name: pgvector
    driverClassName: org.postgresql.Driver
    url: jdbc:postgresql://localhost:5432/postgres?currentSchema=public&connectTimeout=60&socketTimeout=60
    username: postgres
    password: postgres
  ai:
    vectorstore:
      type: pgvector
      pgvector:
        initialize-schema: true
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1024
        max-document-batch-size: 10000
        schema-name: public
        table-name: vector_store

设置initialize-schema为true,默认会执行如下初始化脚本:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS vector_store (
	id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
	content text,
	metadata json,
	embedding vector(1536) // 1536 is the default embedding dimension
);

CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops);

脚本源码:
org/springframework/ai/vectorstore/pgvector/PgVectorStore.java

	public void afterPropertiesSet() {

		logger.info("Initializing PGVectorStore schema for table: {} in schema: {}", this.getVectorTableName(),
				this.getSchemaName());

		logger.info("vectorTableValidationsEnabled {}", this.schemaValidation);

		if (this.schemaValidation) {
			this.schemaValidator.validateTableSchema(this.getSchemaName(), this.getVectorTableName());
		}

		if (!this.initializeSchema) {
			logger.debug("Skipping the schema initialization for the table: {}", this.getFullyQualifiedTableName());
			return;
		}

		// Enable the PGVector, JSONB and UUID support.
		this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS vector");
		this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS hstore");

		if (this.idType == PgIdType.UUID) {
			this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\"");
		}

		this.jdbcTemplate.execute(String.format("CREATE SCHEMA IF NOT EXISTS %s", this.getSchemaName()));

		// Remove existing VectorStoreTable
		if (this.removeExistingVectorStoreTable) {
			this.jdbcTemplate.execute(String.format("DROP TABLE IF EXISTS %s", this.getFullyQualifiedTableName()));
		}

		this.jdbcTemplate.execute(String.format("""
				CREATE TABLE IF NOT EXISTS %s (
					id %s PRIMARY KEY,
					content text,
					metadata json,
					embedding vector(%d)
				)
				""", this.getFullyQualifiedTableName(), this.getColumnTypeName(), this.embeddingDimensions()));

		if (this.createIndexMethod != PgIndexType.NONE) {
			this.jdbcTemplate.execute(String.format("""
					CREATE INDEX IF NOT EXISTS %s ON %s USING %s (embedding %s)
					""", this.getVectorIndexName(), this.getFullyQualifiedTableName(), this.createIndexMethod,
					this.getDistanceType().index));
		}
	}

代码

    @Test
    public void testAddAndSearch() {
        List<Document> documents = List.of(
                new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
                new Document("The World is Big and Salvation Lurks Around the Corner"),
                new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

        // Add the documents to Milvus Vector Store
        pgVectorStore.add(documents);

        // Retrieve documents similar to a query
        List<Document> results = this.pgVectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());
        log.info("results:{}", JSON.toJSONString(results));
    }

输出如下:

results:[{"contentFormatter":{"excludedEmbedMetadataKeys":[],"excludedInferenceMetadataKeys":[],"metadataSeparator":"\n","metadataTemplate":"{key}: {value}","textTemplate":"{metadata_string}\n\n{content}"},"formattedContent":"distance: 0.43509135\nmeta1: meta1\n\nSpring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!","id":"9dbce9af-0451-4bdb-8f03-1f8b8c4d696f","metadata":{"distance":0.43509135,"meta1":"meta1"},"score":0.5649086534976959,"text":"Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!"},{"contentFormatter":{"$ref":"$[0].contentFormatter"},"formattedContent":"distance: 0.57093126\n\nThe World is Big and Salvation Lurks Around the Corner","id":"92a45683-11fc-48b7-8676-dcca3b518dd4","metadata":{"distance":0.57093126},"score":0.42906874418258667,"text":"The World is Big and Salvation Lurks Around the Corner"},{"contentFormatter":{"$ref":"$[0].contentFormatter"},"formattedContent":"distance: 0.5936024\nmeta2: meta2\n\nYou walk forward facing the past and you turn back toward the future.","id":"298f6565-bcc7-4cbc-8552-4c0e2d021dbf","metadata":{"distance":0.5936024,"meta2":"meta2"},"score":0.40639758110046387,"text":"You walk forward facing the past and you turn back toward the future."}]

源码

PgVectorStoreAutoConfiguration

org/springframework/ai/vectorstore/pgvector/autoconfigure/PgVectorStoreAutoConfiguration.java

@AutoConfiguration(after = JdbcTemplateAutoConfiguration.class)
@ConditionalOnClass({ PgVectorStore.class, DataSource.class, JdbcTemplate.class })
@EnableConfigurationProperties(PgVectorStoreProperties.class)
@ConditionalOnProperty(name = SpringAIVectorStoreTypes.TYPE, havingValue = SpringAIVectorStoreTypes.PGVECTOR,
		matchIfMissing = true)
public class PgVectorStoreAutoConfiguration {

	@Bean
	@ConditionalOnMissingBean(BatchingStrategy.class)
	BatchingStrategy pgVectorStoreBatchingStrategy() {
		return new TokenCountBatchingStrategy();
	}

	@Bean
	@ConditionalOnMissingBean
	public PgVectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel,
			PgVectorStoreProperties properties, ObjectProvider<ObservationRegistry> observationRegistry,
			ObjectProvider<VectorStoreObservationConvention> customObservationConvention,
			BatchingStrategy batchingStrategy) {

		var initializeSchema = properties.isInitializeSchema();

		return PgVectorStore.builder(jdbcTemplate, embeddingModel)
			.schemaName(properties.getSchemaName())
			.idType(properties.getIdType())
			.vectorTableName(properties.getTableName())
			.vectorTableValidationsEnabled(properties.isSchemaValidation())
			.dimensions(properties.getDimensions())
			.distanceType(properties.getDistanceType())
			.removeExistingVectorStoreTable(properties.isRemoveExistingVectorStoreTable())
			.indexType(properties.getIndexType())
			.initializeSchema(initializeSchema)
			.observationRegistry(observationRegistry.getIfUnique(() -> ObservationRegistry.NOOP))
			.customObservationConvention(customObservationConvention.getIfAvailable(() -> null))
			.batchingStrategy(batchingStrategy)
			.maxDocumentBatchSize(properties.getMaxDocumentBatchSize())
			.build();
	}

}

PgVectorStoreAutoConfiguration在spring.ai.vectorstore.typepgvector时会自动装配PgVectorStore,它依赖PgVectorStoreProperties及JdbcTemplateAutoConfiguration

PgVectorStoreProperties

org/springframework/ai/vectorstore/pgvector/autoconfigure/PgVectorStoreProperties.java

@ConfigurationProperties(PgVectorStoreProperties.CONFIG_PREFIX)
public class PgVectorStoreProperties extends CommonVectorStoreProperties {

	public static final String CONFIG_PREFIX = "spring.ai.vectorstore.pgvector";

	private int dimensions = PgVectorStore.INVALID_EMBEDDING_DIMENSION;

	private PgIndexType indexType = PgIndexType.HNSW;

	private PgDistanceType distanceType = PgDistanceType.COSINE_DISTANCE;

	private boolean removeExistingVectorStoreTable = false;

	// Dynamically generate table name in PgVectorStore to allow backward compatibility
	private String tableName = PgVectorStore.DEFAULT_TABLE_NAME;

	private String schemaName = PgVectorStore.DEFAULT_SCHEMA_NAME;

	private PgVectorStore.PgIdType idType = PgVectorStore.PgIdType.UUID;

	private boolean schemaValidation = PgVectorStore.DEFAULT_SCHEMA_VALIDATION;

	private int maxDocumentBatchSize = PgVectorStore.MAX_DOCUMENT_BATCH_SIZE;

	//......
}	

PgVectorStoreProperties继承了CommonVectorStoreProperties的initializeSchema配置,它提供了spring.ai.vectorstore.pgvector的配置,主要有dimensions、indexType、distanceType、removeExistingVectorStoreTable、tableName、schemaName、idType、schemaValidation、maxDocumentBatchSize这几个属性

JdbcTemplateAutoConfiguration

org/springframework/boot/autoconfigure/jdbc/JdbcTemplateAutoConfiguration.java

@AutoConfiguration(after = DataSourceAutoConfiguration.class)
@ConditionalOnClass({ DataSource.class, JdbcTemplate.class })
@ConditionalOnSingleCandidate(DataSource.class)
@EnableConfigurationProperties(JdbcProperties.class)
@Import({ DatabaseInitializationDependencyConfigurer.class, JdbcTemplateConfiguration.class,
		NamedParameterJdbcTemplateConfiguration.class })
public class JdbcTemplateAutoConfiguration {

}

JdbcTemplateAutoConfiguration引入了DatabaseInitializationDependencyConfigurer、JdbcTemplateConfiguration、NamedParameterJdbcTemplateConfiguration

小结

Spring AI提供了spring-ai-starter-vector-store-pgvector用于自动装配PgVectorStore。除了spring.ai.vectorstore.pgvector的配置,还需要配置spring.datasource

doc

  • vectordbs/pgvector

相关文章:

  • OpenCV 图形API(17)计算输入矩阵 src 中每个元素的平方根函数sqrt()
  • oklink js逆向(入口定位)
  • 1.2 测试设计阶段:打造高质量的测试用例
  • c++ 函数后面加const 作用
  • kaggle竞赛——房价预测
  • 轨迹预测Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
  • 基于 Vue 3 + html2canvas 实现网页任意区域截图组件
  • 抓wifi无线空口包之Ubuntu抓包(二)
  • Linux-CentOS-7—— 安装MySQL 8
  • Kafka 中的幂等机制
  • SQLI打靶
  • 【嵌入式学习6】多任务版TCP服务器
  • 玄机-第六章-哥斯拉4.0流量分析的测试报告
  • 盛水最多的容器
  • Kafka负载均衡挑战解决
  • Jupyter Notebook不能自动打开默认浏览器怎么办?
  • IDEA快速入门
  • Airflow集成Lark机器人
  • 深入理解PCA降维:原理、实现与应用
  • 【Introduction to Reinforcement Learning】翻译解读2
  • 建立应用网站/市场营销方案怎么写