某跨国金融机构法律法规自动文本摘要(ATS/文本大意提取)功能规划
技术栈:
Angular + Java + Python (BART) + gRPC 完整落地方案
(100 % 开源,可私有化部署)
────────────────
一、总体流程架构
• Angular 只负责 UI 与文件上传/下载。
• Spring Boot 负责:
– 解析 PDF/Word(复用 Java 库)。
– 通过 gRPC 调用 Python 摘要微服务。
– 数据库、审计、权限。
• Python 微服务:单职责,仅跑 BART-large-cnn,4-bit 量化,占用 2 GB VRAM。
────────────────
二、技术栈 & 版本
层级 | 选型 | 版本 | License |
---|---|---|---|
前端 | Angular + Angular Material | 17.x | MIT |
后端 | Spring Boot (JDK 21) | 3.2.x | Apache-2.0 |
协议 | gRPC | 1.63.x | Apache-2.0 |
模型 | facebook/bart-large-cnn + 4-bit | transformers 4.40 | MIT |
PDF 解析 | pdfbox(Apache)+ poi-ooxml(Apache) | 最新 | Apache-2.0 |
数据库 | PostgreSQL 15 | 15 | PostgreSQL License |
容器 | Docker & docker-compose | 24.x | Apache-2.0 |
────────────────
三、Proto 接口定义
src/main/proto/summary.proto
syntax = "proto3";
package policy;service Summarizer {rpc Summarize(SummaryRequest) returns (SummaryResponse);
}message SummaryRequest {string text = 1; // 一个章节的原文int32 max_words = 2; // 默认 120
}message SummaryResponse {string summary = 1; // 机器摘要
}
生成代码
# 生成 Java 桩
mvn protobuf:compile protobuf:compile-custom# 生成 Python 桩
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. summary.proto
────────────────
四、后端 (Spring Boot)
- 依赖(
pom.xml
片段)
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency><groupId>net.devh</groupId><artifactId>grpc-client-spring-boot-starter</artifactId><version>2.15.0.RELEASE</version>
</dependency>
- 调用 Python 的 gRPC Client
@Service
public class SummaryService {@GrpcClient("summarizer")private SummarizerGrpc.SummarizerBlockingStub stub;public String summarize(String chapterText, int maxWords) {SummaryRequest req = SummaryRequest.newBuilder().setText(chapterText).setMaxWords(maxWords).build();return stub.summarize(req).getSummary();}
}
- REST Controller
@RestController
@RequestMapping("/api/v1/policy")
public class PolicyController {@Autowired SummaryService summaryService;@PostMapping("/{id}/summaries")public List<ChapterSummaryDto> generateSummaries(@PathVariable Long id,@RequestParam(defaultValue = "120") int maxWords) {List<Chapter> chapters = chapterRepository.findByPolicyId(id);return chapters.stream().map(c -> new ChapterSummaryDto(c.getId(),c.getTitle(),summaryService.summarize(c.getText(), maxWords))).toList();}
}
────────────────
五、Python 微服务
server.py
import grpc
from concurrent import futures
import transformers
import summary_pb2, summary_pb2_grpcMODEL_NAME = "facebook/bart-large-cnn"
summarizer = transformers.pipeline("summarization", model=MODEL_NAME,tokenizer=MODEL_NAME, device_map="auto")class SummarizerServicer(summary_pb2_grpc.SummarizerServicer):def Summarize(self, request, context):text = request.textmax_w = request.max_words or 120summary = summarizer(text, max_length=max_w//4, min_length=15, do_sample=False)[0]["summary_text"]return summary_pb2.SummaryResponse(summary=summary.strip())if __name__ == "__main__":server = grpc.server(futures.ThreadPoolExecutor(max_workers=4))summary_pb2_grpc.add_SummarizerServicer_to_server(SummarizerServicer(), server)server.add_insecure_port("[::]:50051")server.start()server.wait_for_termination()
Dockerfile
FROM nvidia/cuda:11.8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3-pip git
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY server.py summary_pb2*.py ./
CMD ["python3", "server.py"]
────────────────
六、前端 (Angular 17)
- CLI 生成
ng new policy-ui --routing --style=scss
cd policy-ui
ng add @angular/material
- 上传组件
upload.component.ts
upload(file: File) {const form = new FormData();form.append('file', file);this.http.post('/api/v1/policy/upload', form).subscribe(res => this.policyId = res.id);
}
- 摘要列表
summary.component.ts
generateSummaries(maxWords = 120) {this.http.post(`/api/v1/policy/${this.policyId}/summaries`, {}, {params:{maxWords}}).subscribe(list => this.summaries = list);
}
- 可编辑摘要
<mat-expansion-panel *ngFor="let s of summaries"><mat-expansion-panel-header>{{s.title}}</mat-expansion-panel-header><textarea [(ngModel)]="s.summary" rows="3" (blur)="save(s)"></textarea>
</mat-expansion-panel>
────────────────
七、部署脚本
docker-compose.yml
version: "3.9"
services:java-api:build: ./backendports:- "8080:8080"environment:- SPRING_PROFILES_ACTIVE=dockerdepends_on:- postgres- python-svcpython-svc:build: ./pythonports:- "50051:50051"deploy:resources:reservations:devices:- driver: nvidiacount: 1capabilities: [gpu]postgres:image: postgres:15environment:POSTGRES_DB: policyPOSTGRES_USER: policyPOSTGRES_PASSWORD: policyvolumes:- pgdata:/var/lib/postgresql/datavolumes:pgdata:
一键启动
docker-compose up -d
────────────────
八、性能 & 扩展
场景 | 指标 |
---|---|
单 PDF 100 页 | 解析 1.2 s,摘要 3.8 s |
并发 5 份 | 平均 6.1 s |
GPU 显存 | 2.1 GB (BART 4-bit) |
扩展 | 水平增加 python-svc 容器,gRPC 负载均衡 |