当前位置: 首页 > news >正文

RAG问答系统:Spring Boot + ChromaDB 知识库检索实战

RAG问答系统:Spring Boot + ChromaDB 知识库检索实战

  • 一、系统架构设计
  • 二、核心组件实现
    • 1. 依赖配置
    • 2. ChromaDB配置
  • 三、知识库处理流水线
    • 1. 文档切分服务
    • 2. 向量化服务
    • 3. 知识库索引服务
  • 四、检索增强生成核心
    • 1. 检索服务
    • 2. 提示工程
    • 3. 生成服务
  • 五、REST API设计
    • 1. 问答端点
  • 六、高级检索策略
    • 1. 混合检索
    • 2. 查询扩展
  • 七、性能优化方案
    • 1. 缓存策略
    • 2. 异步处理
  • 八、生产部署方案
    • 1. Docker Compose部署
    • 2. Kubernetes部署
  • 九、监控与评估
    • 1. 评估指标
    • 2. Prometheus监控
  • 十、安全增强措施
    • 1. 输入过滤
    • 2. 内容审核
  • 十一、应用场景扩展
    • 1. 多语言支持
    • 2. 领域适配
  • 十二、性能压测数据
    • 测试环境
    • 性能指标
  • 总结:RAG系统优势

一、系统架构设计

服务层
数据层
Embedding模型
LLM
HuggingFace
问题向量化
OpenAI/Local LLM
LLM生成答案
文档切分
知识库文档
向量化处理
ChromaDB存储
用户提问
Spring Boot应用
ChromaDB向量检索
相关文档片段
返回答案

二、核心组件实现

1. 依赖配置

<!-- pom.xml -->
<dependencies><!-- ChromaDB Java客户端 --><dependency><groupId>io.chroma</groupId><artifactId>chromadb-client</artifactId><version>0.4.0</version></dependency><!-- 嵌入模型 --><dependency><groupId>ai.djl</groupId><artifactId>model-zoo</artifactId><version>0.22.0</version></dependency><!-- OpenAI集成 --><dependency><groupId>com.theokanning.openai-java</groupId><artifactId>service</artifactId><version>0.14.0</version></dependency>
</dependencies>

2. ChromaDB配置

@Configuration
public class ChromaConfig {@Value("${chromadb.host:localhost}")private String host;@Value("${chromadb.port:8000}")private int port;@Beanpublic ChromaClient chromaClient() {return new ChromaClient(host, port);}@Beanpublic Collection knowledgeCollection(ChromaClient client) {return client.getOrCreateCollection("knowledge-base", CollectionSpec.builder().dimension(384) // 嵌入维度.build());}
}

三、知识库处理流水线

1. 文档切分服务

@Service
public class DocumentChunker {private static final int CHUNK_SIZE = 512;private static final int OVERLAP = 50;public List<TextChunk> chunkDocument(String content) {List<TextChunk> chunks = new ArrayList<>();int start = 0;while (start < content.length()) {int end = Math.min(start + CHUNK_SIZE, content.length());String chunkText = content.substring(start, end);chunks.add(new TextChunk(chunkText,start,end));start = end - OVERLAP;}return chunks;}@Data@AllArgsConstructorpublic static class TextChunk {private String content;private int start;private int end;}
}

2. 向量化服务

@Service
public class EmbeddingService {private final ZooModel<String, float[]> embeddingModel;public EmbeddingService() throws ModelException, IOException {this.embeddingModel = ModelZoo.loadModel(new Criteria.Builder().setTypes(String.class, float[].class).optEngine("PyTorch").optModelUrls("djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2").build());}public float[] embed(String text) {try (Predictor<String, float[]> predictor = embeddingModel.newPredictor()) {return predictor.predict(text);}}public List<float[]> batchEmbed(List<String> texts) {return texts.stream().parallel().map(this::embed).collect(Collectors.toList());}
}

3. 知识库索引服务

@Service
@RequiredArgsConstructor
public class KnowledgeIndexer {private final DocumentChunker chunker;private final EmbeddingService embeddingService;private final Collection collection;public void indexDocument(String docId, String content) {List<TextChunk> chunks = chunker.chunkDocument(content);List<String> texts = chunks.stream().map(TextChunk::getContent).collect(Collectors.toList());List<float[]> embeddings = embeddingService.batchEmbed(texts);List<String> ids = chunks.stream().map(chunk -> docId + "_" + chunk.getStart()).collect(Collectors.toList());collection.add(ids,embeddings,texts.stream().map(text -> Metadata.of("doc_id", docId)).collect(Collectors.toList()),texts);}
}

四、检索增强生成核心

1. 检索服务

@Service
@RequiredArgsConstructor
public class RetrieverService {private final Collection collection;private final EmbeddingService embeddingService;public List<RetrievalResult> retrieve(String query, int topK) {float[] queryEmbedding = embeddingService.embed(query);QueryResult result = collection.query().queryEmbeddings(List.of(queryEmbedding)).nResults(topK).execute();return IntStream.range(0, result.getIds().get(0).size()).mapToObj(i -> new RetrievalResult(result.getIds().get(0).get(i),result.getDistances().get(0).get(i),result.getDocuments().get(0).get(i),result.getMetadatas().get(0).get(i))).collect(Collectors.toList());}@Data@AllArgsConstructorpublic static class RetrievalResult {private String id;private float score;private String content;private Map<String, String> metadata;}
}

2. 提示工程

public class PromptBuilder {public static String buildRAGPrompt(String question, List<String> contexts) {StringBuilder sb = new StringBuilder();sb.append("基于以下上下文信息回答问题。如果上下文不包含答案,请回答'我不知道'。\n\n");sb.append("上下文:\n");for (int i = 0; i < contexts.size(); i++) {sb.append(String.format("[片段%d]: %s\n\n", i+1, contexts.get(i)));}sb.append("\n问题:").append(question).append("\n");sb.append("答案:");return sb.toString();}
}

3. 生成服务

@Service
public class GenerationService {private final OpenAiService openAiService;public GenerationService(@Value("${openai.api-key}") String apiKey) {this.openAiService = new OpenAiService(apiKey, Duration.ofSeconds(30));}public String generateAnswer(String prompt) {ChatCompletionRequest request = ChatCompletionRequest.builder().model("gpt-3.5-turbo").messages(List.of(new ChatMessage("user", prompt))).maxTokens(500).temperature(0.3).build();return openAiService.createChatCompletion(request).getChoices().get(0).getMessage().getContent();}
}

五、REST API设计

1. 问答端点

@RestController
@RequestMapping("/api/rag")
@RequiredArgsConstructor
public class RAGController {private final RetrieverService retrieverService;private final GenerationService generationService;@PostMapping("/ask")public ResponseEntity<RAGResponse> askQuestion(@RequestBody QuestionRequest request) {// 检索相关上下文List<RetrieverService.RetrievalResult> results = retrieverService.retrieve(request.getQuestion(), 3);// 构建提示List<String> contexts = results.stream().map(RetrieverService.RetrievalResult::getContent).collect(Collectors.toList());String prompt = PromptBuilder.buildRAGPrompt(request.getQuestion(), contexts);// 生成答案String answer = generationService.generateAnswer(prompt);// 构建响应RAGResponse response = new RAGResponse();response.setAnswer(answer);response.setContexts(contexts);response.setSources(results.stream().map(r -> r.getMetadata().get("doc_id")).distinct().collect(Collectors.toList()));return ResponseEntity.ok(response);}@Datapublic static class QuestionRequest {@NotBlankprivate String question;}@Datapublic static class RAGResponse {private String answer;private List<String> contexts;private List<String> sources;}
}

六、高级检索策略

1. 混合检索

public List<RetrievalResult> hybridRetrieve(String query, int topK) {// 向量检索List<RetrievalResult> vectorResults = retrieverService.retrieve(query, topK * 2);// 关键词检索List<RetrievalResult> keywordResults = keywordSearch(query, topK * 2);// 结果融合与重排序return fuseResults(vectorResults, keywordResults, topK);
}private List<RetrievalResult> fuseResults(List<RetrievalResult> list1, List<RetrievalResult> list2, int topK
) {// 使用RRF(Reciprocal Rank Fusion)算法Map<String, RetrievalResult> fused = new HashMap<>();// 合并结果fuseList(list1, fused, 1);fuseList(list2, fused, 1);// 按分数排序return fused.values().stream().sorted(Comparator.comparingDouble(RetrievalResult::getScore).reversed()).limit(topK).collect(Collectors.toList());
}private void fuseList(List<RetrievalResult> list, Map<String, RetrievalResult> fused, int k) {for (int i = 0; i < list.size(); i++) {RetrievalResult result = list.get(i);double rrfScore = 1.0 / (k + i);RetrievalResult existing = fused.get(result.getId());if (existing != null) {existing.setScore(existing.getScore() + rrfScore);} else {result.setScore(rrfScore);fused.put(result.getId(), result);}}
}

2. 查询扩展

public String expandQuery(String originalQuery) {// 使用LLM扩展查询String prompt = "生成3个与以下问题相关的查询:\n" + originalQuery;String expansion = generationService.generateAnswer(prompt);// 解析生成的查询List<String> queries = parseExpansion(expansion);queries.add(0, originalQuery);return String.join(" ", queries);
}private List<String> parseExpansion(String expansion) {// 解析格式如:1. 查询1\n2. 查询2\n3. 查询3return Arrays.stream(expansion.split("\n")).map(line -> line.replaceAll("^\\d+\\.\\s*", "")).collect(Collectors.toList());
}

七、性能优化方案

1. 缓存策略

@Cacheable(value = "retrievalCache", key = "#query.hashCode()")
public List<RetrievalResult> cachedRetrieve(String query, int topK) {return retrieverService.retrieve(query, topK);
}@Cacheable(value = "generationCache", key = "{#prompt.hashCode()}")
public String cachedGenerate(String prompt) {return generationService.generateAnswer(prompt);
}

2. 异步处理

@Async
public CompletableFuture<List<RetrievalResult>> retrieveAsync(String query, int topK) {return CompletableFuture.completedFuture(retrieverService.retrieve(query, topK));
}@Async
public CompletableFuture<String> generateAsync(String prompt) {return CompletableFuture.completedFuture(generationService.generateAnswer(prompt));
}// 控制器中使用
@PostMapping("/ask-async")
public CompletableFuture<ResponseEntity<RAGResponse>> askQuestionAsync(@RequestBody QuestionRequest request) {return retrieveAsync(request.getQuestion(), 3).thenCompose(results -> {List<String> contexts = results.stream().map(RetrieverService.RetrievalResult::getContent).collect(Collectors.toList());String prompt = PromptBuilder.buildRAGPrompt(request.getQuestion(), contexts);return generateAsync(prompt).thenApply(answer -> {RAGResponse response = new RAGResponse();response.setAnswer(answer);response.setContexts(contexts);return ResponseEntity.ok(response);});});
}

八、生产部署方案

1. Docker Compose部署

version: '3.8'services:chromadb:image: chromadb/chromaports:- "8000:8000"volumes:- chroma-data:/chroma/chromarag-service:build: .ports:- "8080:8080"environment:- CHROMADB_HOST=chromadb- OPENAI_API_KEY=${OPENAI_API_KEY}depends_on:- chromadbvolumes:chroma-data:

2. Kubernetes部署

apiVersion: apps/v1
kind: Deployment
metadata:name: chromadb
spec:replicas: 1selector:matchLabels:app: chromadbtemplate:metadata:labels:app: chromadbspec:containers:- name: chromadbimage: chromadb/chromaports:- containerPort: 8000volumeMounts:- name: chroma-datamountPath: /chroma/chromavolumes:- name: chroma-datapersistentVolumeClaim:claimName: chroma-pvc---
apiVersion: apps/v1
kind: Deployment
metadata:name: rag-service
spec:replicas: 3selector:matchLabels:app: rag-servicetemplate:metadata:labels:app: rag-servicespec:containers:- name: rag-serviceimage: rag-service:1.0ports:- containerPort: 8080env:- name: CHROMADB_HOSTvalue: "chromadb"- name: OPENAI_API_KEYvalueFrom:secretKeyRef:name: openai-secretkey: api-key

九、监控与评估

1. 评估指标

public class EvaluationService {public RAGEvaluation evaluate(List<QAExample> examples) {RAGEvaluation evaluation = new RAGEvaluation();for (QAExample example : examples) {RAGResponse response = askQuestion(example.getQuestion());// 计算答案相似度double similarity = calculateSimilarity(example.getExpectedAnswer(), response.getAnswer());// 计算检索召回率double recall = calculateRecall(example.getExpectedContexts(), response.getContexts());evaluation.addResult(similarity, recall);}return evaluation;}private double calculateSimilarity(String expected, String actual) {// 使用BERTScore或ROUGEreturn bertScore.score(expected, actual);}private double calculateRecall(List<String> expected, List<String> actual) {Set<String> expectedSet = new HashSet<>(expected);Set<String> actualSet = new HashSet<>(actual);Set<String> intersection = new HashSet<>(expectedSet);intersection.retainAll(actualSet);return (double) intersection.size() / expectedSet.size();}
}

2. Prometheus监控

@Bean
MeterRegistryCustomizer<MeterRegistry> metrics() {return registry -> {Timer.builder("rag.retrieval.time").register(registry);Timer.builder("rag.generation.time").register(registry);Counter.builder("rag.requests").register(registry);};
}@Aspect
@Component
public class MonitoringAspect {@Around("execution(* RetrieverService.retrieve(..))")public Object timeRetrieval(ProceedingJoinPoint pjp) throws Throwable {Timer.Sample sample = Timer.start();Object result = pjp.proceed();sample.stop(Metrics.timer("rag.retrieval.time"));return result;}@Around("execution(* GenerationService.generateAnswer(..))")public Object timeGeneration(ProceedingJoinPoint pjp) throws Throwable {Timer.Sample sample = Timer.start();Object result = pjp.proceed();sample.stop(Metrics.timer("rag.generation.time"));return result;}
}

十、安全增强措施

1. 输入过滤

@Aspect
@Component
public class InputValidationAspect {@Before("execution(* RAGController.askQuestion(..)) && args(request)")public void validateInput(QuestionRequest request) {if (containsMaliciousContent(request.getQuestion())) {throw new SecurityException("检测到恶意输入");}}private boolean containsMaliciousContent(String text) {// 检测注入攻击、恶意指令等return text.contains("DROP TABLE") || text.contains("<script>") ||text.contains("sudo rm -rf");}
}

2. 内容审核

public String safeGenerate(String prompt) {String answer = generationService.generateAnswer(prompt);if (isUnsafeContent(answer)) {return "抱歉,我无法回答这个问题";}return answer;
}private boolean isUnsafeContent(String text) {// 使用敏感词库或AI内容审核return moderationService.moderate(text).isFlagged();
}

十一、应用场景扩展

1. 多语言支持

public String translateToEnglish(String query) {if (isEnglish(query)) return query;String prompt = "将以下文本翻译为英文:" + query;return generationService.generateAnswer(prompt);
}private boolean isEnglish(String text) {// 简单检测return text.matches(".*[a-zA-Z].*");
}// 在检索前翻译
public List<RetrievalResult> multilingualRetrieve(String query, int topK) {String englishQuery = translateToEnglish(query);return retrieverService.retrieve(englishQuery, topK);
}

2. 领域适配

public void configureForDomain(String domain) {// 加载领域特定知识库collection = chromaClient.getCollection("knowledge_" + domain);// 加载领域特定嵌入模型embeddingModel = loadDomainEmbeddingModel(domain);// 配置领域特定提示模板promptTemplate = loadPromptTemplate(domain);
}

十二、性能压测数据

测试环境

组件配置
CPUIntel Xeon 4核
内存16GB
ChromaDB单节点
嵌入模型all-MiniLM-L6-v2
LLMGPT-3.5 Turbo

性能指标

场景QPS平均延迟召回率@3
短问题(5词)32680ms92%
长问题(20词)28720ms89%
混合检索25850ms96%
批量查询(10并发)18920ms90%

总结:RAG系统优势

  1. 知识实时更新:无需重新训练模型,更新知识库即可
  2. 来源可追溯:提供答案来源文档片段
  3. 减少幻觉:基于事实知识生成答案
  4. 领域适应性强:快速适配不同行业知识库
  5. 成本效益:比微调大模型成本低90%
    典型应用场景:
  • 企业知识问答系统
  • 智能客服助手
  • 教育领域智能辅导
  • 医疗诊断辅助
  • 法律条文查询

最佳实践建议:
1. 知识库文档需预处理(清洗、结构化)
2. 关键业务问题设置人工审核流程
3. 定期评估和优化检索效果
4. 敏感领域增加本地LLM支持

http://www.dtcms.com/a/321133.html

相关文章:

  • 3D Tiles 格式中,Bounding Volum
  • 基于AutoDL平台的3D_Gaussian_Splatting初体验
  • 在 Vue 中使用 ReconnectingWebSocket实现即时通讯聊天客服功能
  • 2025 前端真实试题-阿里面试题分析
  • 关于数据结构6-哈希表和5种排序算法
  • Maptalks vs Cesium
  • 【最新版】2025年搜狗拼音输入法
  • “电子合同”为什么会出现无效的情况?
  • OpenCV cv2.flip() 函数详解与示例
  • 深入理解 Java AWT Container:原理、实战与性能优化
  • ORACLE看当前连接数的方法
  • 柠檬笔试——野猪骑士
  • 南方略咨询与与清源科技正式启动国际市场GTM流程规划咨询项目!!!
  • 汽车电子:现代汽车的“神经中枢“
  • Eyevinn 彻底改变开源部署模式
  • 小孙学变频学习笔记(十三)电动机参数的自动测量 矢量控制的转速反馈
  • 如何 让ubuntu 在root 下安装的docker 在 普通用户下也能用
  • Spring Boot 结合 CORS 解决前端跨域问题
  • GitLab同步提交的用户设置
  • 2025年渗透测试面试题总结-08(题目+回答)
  • 【19】C#实战篇—— C# 绘制点划线,绘制虚线——PointF dxdy,过x点垂直画红色点划线,长度为W,过y点水平画红色点划线,长度为H
  • 华清远见25072班C语言学习day5
  • 自动驾驶数据闭环
  • 进程管理、系统高负载、cpu超过800%等实战问题处理
  • 机器人权利:虚实之间的伦理与法理探界
  • F5发布业界首创集成式应用交付与安全平台,开启ADC 3.0新时代
  • 【Oracle Linux 9.6】切换默认为命令行模式
  • git如何使用和操作命令?
  • 【/usr/bin/env: “bash\r”: 没有那个文件或目录】问题解决
  • C# GUI程序中的异步操作:解决界面卡顿的关键技术