Elasticsearch高能指南
Elasticsearch 完整指南
目录
- Elasticsearch简介
- 核心概念
- 使用技巧
- 重难点解析
- Spring Boot集成
- 最佳实践
Elasticsearch简介
什么是Elasticsearch
Elasticsearch是一个基于Apache Lucene的分布式搜索引擎,提供实时搜索和分析功能。它是Elastic Stack(ELK Stack)的核心组件,广泛应用于日志分析、全文搜索、监控分析等场景。
主要特点
- 分布式架构: 支持水平扩展,自动分片和复制
- 实时搜索: 近实时的搜索和分析能力
- 全文搜索: 基于Lucene的强大全文搜索功能
- RESTful API: 简单易用的HTTP API接口
- 多租户: 支持多索引和多类型数据
- 聚合分析: 强大的数据聚合和分析能力
适用场景
- 企业级搜索引擎
- 日志分析和监控
- 电商商品搜索
- 内容管理系统
- 实时数据分析
- 安全信息分析
核心概念
集群架构
Elasticsearch集群
├── 节点(Node) - 单个ES实例
│ ├── 主节点(Master Node) - 集群管理
│ ├── 数据节点(Data Node) - 数据存储和搜索
│ └── 协调节点(Coordinating Node) - 请求路由
├── 索引(Index) - 逻辑数据容器
│ ├── 分片(Shard) - 数据物理分割
│ │ ├── 主分片(Primary Shard) - 数据写入
│ │ └── 副本分片(Replica Shard) - 数据备份
│ └── 映射(Mapping) - 字段类型定义
└── 文档(Document) - 最小数据单元
数据类型
- 核心类型: text, keyword, long, integer, double, boolean, date
- 复杂类型: object, nested, array
- 地理类型: geo_point, geo_shape
- 特殊类型: ip, completion, token_count, percolator
使用技巧
1. 索引管理
// 创建索引
PUT /my_index
{"settings": {"number_of_shards": 3,"number_of_replicas": 1,"analysis": {"analyzer": {"my_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase", "stop"]}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "my_analyzer","fields": {"keyword": {"type": "keyword"}}},"content": {"type": "text","analyzer": "my_analyzer"},"tags": {"type": "keyword"},"created_at": {"type": "date"}}}
}
2. 搜索查询
// 复杂搜索查询
GET /my_index/_search
{"query": {"bool": {"must": [{"multi_match": {"query": "elasticsearch tutorial","fields": ["title^2", "content"],"type": "best_fields","fuzziness": "AUTO"}}],"filter": [{"range": {"created_at": {"gte": "2023-01-01","lte": "2023-12-31"}}},{"terms": {"tags": ["search", "tutorial"]}}]}},"aggs": {"tag_counts": {"terms": {"field": "tags","size": 10}},"date_histogram": {"date_histogram": {"field": "created_at","calendar_interval": "month"}}},"sort": [{ "_score": { "order": "desc" } },{ "created_at": { "order": "desc" } }],"from": 0,"size": 20
}
3. 聚合分析
// 复杂聚合查询
GET /orders/_search
{"size": 0,"aggs": {"sales_by_category": {"terms": {"field": "category","size": 10},"aggs": {"total_sales": {"sum": {"field": "amount"}},"avg_order_value": {"avg": {"field": "amount"}},"sales_trend": {"date_histogram": {"field": "order_date","calendar_interval": "month"},"aggs": {"monthly_sales": {"sum": {"field": "amount"}}}}}},"global_sales_stats": {"global": {},"aggs": {"total_revenue": {"sum": {"field": "amount"}},"avg_order_value": {"avg": {"field": "amount"}}}}}
}
4. 批量操作
// 批量索引文档
POST /_bulk
{"index":{"_index":"my_index","_id":"1"}}
{"title":"Elasticsearch Guide","content":"Complete guide to ES","tags":["search","guide"]}
{"index":{"_index":"my_index","_id":"2"}}
{"title":"Spring Boot Integration","content":"How to integrate ES with Spring Boot","tags":["spring","integration"]}// 批量更新
POST /_bulk
{"update":{"_index":"my_index","_id":"1"}}
{"doc":{"views":100,"updated_at":"2023-12-01"}}
{"update":{"_index":"my_index","_id":"2"}}
{"doc":{"views":50,"updated_at":"2023-12-01"}}
重难点解析
1. 分片策略
// 分片数量计算
// 分片数 = 数据量 / 单个分片大小(建议30-50GB)
// 分片数 = CPU核心数 * 2// 分片键选择
PUT /logs/_settings
{"index.routing.allocation.require.box_type": "hot"
}// 分片预热
POST /logs/_forcemerge?max_num_segments=1
2. 性能优化
// 查询优化
GET /my_index/_search
{"query": {"constant_score": {"filter": {"term": {"status": "active"}}}},"_source": ["title", "content"], // 只返回需要的字段"size": 1000 // 避免深度分页
}// 索引优化
PUT /my_index/_settings
{"index.refresh_interval": "30s", // 降低刷新频率"index.number_of_replicas": 0, // 写入时减少副本"index.translog.durability": "async" // 异步事务日志
}
3. 集群管理
// 集群健康检查
GET /_cluster/health?pretty// 节点信息
GET /_nodes/stats?pretty// 索引统计
GET /_stats?pretty// 分片分配
PUT /_cluster/settings
{"persistent": {"cluster.routing.allocation.enable": "all"}
}
4. 数据建模
// 父子关系建模
PUT /company
{"mappings": {"properties": {"name": { "type": "text" },"join_field": {"type": "join","relations": {"company": "employee"}}}}
}// 嵌套对象
PUT /products
{"mappings": {"properties": {"name": { "type": "text" },"variants": {"type": "nested","properties": {"color": { "type": "keyword" },"size": { "type": "keyword" },"price": { "type": "double" }}}}}
}
Spring Boot集成
1. 依赖配置
<!-- Maven -->
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
// Gradle
implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'
2. 配置文件
# application.yml
spring:elasticsearch:uris: http://localhost:9200connection-timeout: 1ssocket-timeout: 30s# 安全配置# username: elastic# password: changeme# 集群配置# uris: http://es-node1:9200,http://es-node2:9200,http://es-node3:9200
3. 实体类定义
@Document(indexName = "articles")
@Setting(settingPath = "es-settings.json")
public class Article {@Idprivate String id;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String title;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String content;@Field(type = FieldType.Keyword)private List<String> tags;@Field(type = FieldType.Keyword)private String category;@Field(type = FieldType.Date)private LocalDateTime createdAt;@Field(type = FieldType.Long)private Long views;// 构造函数、getter、setter
}
4. Repository接口
@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article, String> {// 自定义查询方法List<Article> findByTitleContaining(String title);List<Article> findByCategoryAndTagsIn(String category, List<String> tags);Page<Article> findByCreatedAtBetween(LocalDateTime start, LocalDateTime end, Pageable pageable);// 使用@Query注解@Query("{\"bool\": {\"must\": [{\"match\": {\"title\": \"?0\"}}]}}")List<Article> searchByTitle(String title);// 聚合查询@Query("{\"aggs\": {\"category_count\": {\"terms\": {\"field\": \"category\"}}}}")SearchHits<Article> getCategoryStats();
}
5. 服务层实现
@Service
public class ArticleService {@Autowiredprivate ArticleRepository articleRepository;@Autowiredprivate ElasticsearchRestTemplate elasticsearchTemplate;// 基本CRUD操作public Article createArticle(Article article) {article.setCreatedAt(LocalDateTime.now());article.setViews(0L);return articleRepository.save(article);}public Article findById(String id) {return articleRepository.findById(id).orElseThrow(() -> new ArticleNotFoundException("Article not found"));}// 复杂搜索查询public SearchPage<Article> searchArticles(SearchRequest request, Pageable pageable) {BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();// 标题和内容搜索if (StringUtils.hasText(request.getKeyword())) {queryBuilder.must(QueryBuilders.multiMatchQuery(request.getKeyword()).field("title", 2.0f).field("content").type(MultiMatchQueryBuilder.Type.BEST_FIELDS).fuzziness(Fuzziness.AUTO));}// 分类过滤if (StringUtils.hasText(request.getCategory())) {queryBuilder.filter(QueryBuilders.termQuery("category", request.getCategory()));}// 标签过滤if (request.getTags() != null && !request.getTags().isEmpty()) {queryBuilder.filter(QueryBuilders.termsQuery("tags", request.getTags()));}// 时间范围过滤if (request.getStartDate() != null && request.getEndDate() != null) {queryBuilder.filter(QueryBuilders.rangeQuery("createdAt").gte(request.getStartDate()).lte(request.getEndDate()));}// 创建搜索查询NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(queryBuilder).withSort(Sort.by(Sort.Direction.DESC, "_score")).withSort(Sort.by(Sort.Direction.DESC, "createdAt")).withPageable(pageable).build();return elasticsearchTemplate.search(searchQuery, Article.class);}// 聚合分析public Map<String, Object> getArticleStats() {// 分类统计TermsAggregationBuilder categoryAgg = AggregationBuilders.terms("category_stats").field("category").size(10);// 标签统计TermsAggregationBuilder tagAgg = AggregationBuilders.terms("tag_stats").field("tags").size(20);// 时间趋势DateHistogramAggregationBuilder timeAgg = AggregationBuilders.dateHistogram("time_trend").field("createdAt").calendarInterval(DateHistogramInterval.MONTH);NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(QueryBuilders.matchAllQuery()).addAggregation(categoryAgg).addAggregation(tagAgg).addAggregation(timeAgg).build();SearchHits<Article> searchHits = elasticsearchTemplate.search(searchQuery, Article.class);Map<String, Object> stats = new HashMap<>();stats.put("category_stats", searchHits.getAggregations().get("category_stats"));stats.put("tag_stats", searchHits.getAggregations().get("tag_stats"));stats.put("time_trend", searchHits.getAggregations().get("time_trend"));return stats;}// 批量操作@Transactionalpublic void bulkIndexArticles(List<Article> articles) {BulkOperations bulkOps = elasticsearchTemplate.bulkOps(BulkOperations.BulkMode.INDEX, Article.class);articles.forEach(article -> {article.setCreatedAt(LocalDateTime.now());article.setViews(0L);bulkOps.insert(article);});bulkOps.execute();}// 更新文档public void updateArticleViews(String id) {UpdateQuery updateQuery = UpdateQuery.builder(id).withScript(new Script(ScriptType.INLINE, "painless", "ctx._source.views += 1", Collections.emptyMap())).build();elasticsearchTemplate.update(updateQuery, IndexCoordinates.of("articles"));}
}
6. 控制器层
@RestController
@RequestMapping("/api/articles")
public class ArticleController {@Autowiredprivate ArticleService articleService;@PostMappingpublic ResponseEntity<Article> createArticle(@RequestBody Article article) {Article createdArticle = articleService.createArticle(article);return ResponseEntity.status(HttpStatus.CREATED).body(createdArticle);}@GetMapping("/{id}")public ResponseEntity<Article> getArticleById(@PathVariable String id) {Article article = articleService.findById(id);return ResponseEntity.ok(article);}@GetMapping("/search")public ResponseEntity<SearchPage<Article>> searchArticles(@ModelAttribute SearchRequest request,@PageableDefault(sort = "createdAt", direction = Sort.Direction.DESC) Pageable pageable) {SearchPage<Article> articles = articleService.searchArticles(request, pageable);return ResponseEntity.ok(articles);}@GetMapping("/stats")public ResponseEntity<Map<String, Object>> getArticleStats() {Map<String, Object> stats = articleService.getArticleStats();return ResponseEntity.ok(stats);}@PostMapping("/bulk")public ResponseEntity<Void> bulkIndexArticles(@RequestBody List<Article> articles) {articleService.bulkIndexArticles(articles);return ResponseEntity.ok().build();}@PutMapping("/{id}/views")public ResponseEntity<Void> incrementViews(@PathVariable String id) {articleService.updateArticleViews(id);return ResponseEntity.ok().build();}
}
7. 配置类
@Configuration
@EnableElasticsearchRepositories(basePackages = "com.example.repository")
public class ElasticsearchConfig extends AbstractElasticsearchConfiguration {@Value("${spring.elasticsearch.uris}")private String elasticsearchUrl;@Override@Beanpublic RestHighLevelClient elasticsearchClient() {ClientConfiguration clientConfiguration = ClientConfiguration.builder().connectedTo(elasticsearchUrl.replace("http://", "")).withConnectTimeout(Duration.ofSeconds(5)).withSocketTimeout(Duration.ofSeconds(30)).build();return RestClients.create(clientConfiguration).rest();}@Beanpublic ElasticsearchRestTemplate elasticsearchRestTemplate() {return new ElasticsearchRestTemplate(elasticsearchClient());}// 自定义转换器@Beanpublic ElasticsearchCustomConversions customConversions() {return new ElasticsearchCustomConversions(Arrays.asList(new LocalDateTimeToDateConverter(),new DateToLocalDateTimeConverter()));}
}
最佳实践
1. 索引设计
- 分片数量: 根据数据量和硬件资源合理设置
- 副本数量: 生产环境至少1个副本
- 映射设计: 合理选择字段类型和分析器
- 索引生命周期: 使用ILM管理索引生命周期
2. 查询优化
- 使用filter context减少评分计算
- 合理使用_source字段减少网络传输
- 避免深度分页,使用search_after
- 使用聚合替代应用层计算
3. 性能调优
- 调整refresh_interval平衡实时性和性能
- 使用bulk API进行批量操作
- 合理设置分片大小(30-50GB)
- 监控集群健康状态
4. 安全配置
- 启用X-Pack安全功能
- 配置TLS/SSL加密
- 设置用户权限和角色
- 定期备份数据
5. 监控和维护
- 监控集群健康状态
- 设置告警机制
- 定期清理无用索引
- 监控查询性能
总结
Elasticsearch作为强大的搜索引擎和分析平台,具有分布式架构、实时搜索、强大的聚合分析等特性。通过合理的索引设计、查询优化和Spring Boot集成,可以构建高性能的搜索和分析应用。
关键要点:
- 理解Elasticsearch的分布式架构和核心概念
- 掌握索引设计和映射配置
- 熟练使用查询DSL和聚合API
- 正确配置Spring Boot集成
- 遵循最佳实践确保性能和稳定性
通过本指南的学习,您应该能够熟练使用Elasticsearch并成功集成到Spring Boot项目中,构建强大的搜索和分析功能。