当前位置: 首页 > news >正文

Elasticsearch高能指南

Elasticsearch 完整指南

目录

  • Elasticsearch简介
  • 核心概念
  • 使用技巧
  • 重难点解析
  • Spring Boot集成
  • 最佳实践

Elasticsearch简介

什么是Elasticsearch

Elasticsearch是一个基于Apache Lucene的分布式搜索引擎,提供实时搜索和分析功能。它是Elastic Stack(ELK Stack)的核心组件,广泛应用于日志分析、全文搜索、监控分析等场景。

主要特点

  • 分布式架构: 支持水平扩展,自动分片和复制
  • 实时搜索: 近实时的搜索和分析能力
  • 全文搜索: 基于Lucene的强大全文搜索功能
  • RESTful API: 简单易用的HTTP API接口
  • 多租户: 支持多索引和多类型数据
  • 聚合分析: 强大的数据聚合和分析能力

适用场景

  • 企业级搜索引擎
  • 日志分析和监控
  • 电商商品搜索
  • 内容管理系统
  • 实时数据分析
  • 安全信息分析

核心概念

集群架构

Elasticsearch集群
├── 节点(Node) - 单个ES实例
│   ├── 主节点(Master Node) - 集群管理
│   ├── 数据节点(Data Node) - 数据存储和搜索
│   └── 协调节点(Coordinating Node) - 请求路由
├── 索引(Index) - 逻辑数据容器
│   ├── 分片(Shard) - 数据物理分割
│   │   ├── 主分片(Primary Shard) - 数据写入
│   │   └── 副本分片(Replica Shard) - 数据备份
│   └── 映射(Mapping) - 字段类型定义
└── 文档(Document) - 最小数据单元

数据类型

  • 核心类型: text, keyword, long, integer, double, boolean, date
  • 复杂类型: object, nested, array
  • 地理类型: geo_point, geo_shape
  • 特殊类型: ip, completion, token_count, percolator

使用技巧

1. 索引管理

// 创建索引
PUT /my_index
{"settings": {"number_of_shards": 3,"number_of_replicas": 1,"analysis": {"analyzer": {"my_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase", "stop"]}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "my_analyzer","fields": {"keyword": {"type": "keyword"}}},"content": {"type": "text","analyzer": "my_analyzer"},"tags": {"type": "keyword"},"created_at": {"type": "date"}}}
}

2. 搜索查询

// 复杂搜索查询
GET /my_index/_search
{"query": {"bool": {"must": [{"multi_match": {"query": "elasticsearch tutorial","fields": ["title^2", "content"],"type": "best_fields","fuzziness": "AUTO"}}],"filter": [{"range": {"created_at": {"gte": "2023-01-01","lte": "2023-12-31"}}},{"terms": {"tags": ["search", "tutorial"]}}]}},"aggs": {"tag_counts": {"terms": {"field": "tags","size": 10}},"date_histogram": {"date_histogram": {"field": "created_at","calendar_interval": "month"}}},"sort": [{ "_score": { "order": "desc" } },{ "created_at": { "order": "desc" } }],"from": 0,"size": 20
}

3. 聚合分析

// 复杂聚合查询
GET /orders/_search
{"size": 0,"aggs": {"sales_by_category": {"terms": {"field": "category","size": 10},"aggs": {"total_sales": {"sum": {"field": "amount"}},"avg_order_value": {"avg": {"field": "amount"}},"sales_trend": {"date_histogram": {"field": "order_date","calendar_interval": "month"},"aggs": {"monthly_sales": {"sum": {"field": "amount"}}}}}},"global_sales_stats": {"global": {},"aggs": {"total_revenue": {"sum": {"field": "amount"}},"avg_order_value": {"avg": {"field": "amount"}}}}}
}

4. 批量操作

// 批量索引文档
POST /_bulk
{"index":{"_index":"my_index","_id":"1"}}
{"title":"Elasticsearch Guide","content":"Complete guide to ES","tags":["search","guide"]}
{"index":{"_index":"my_index","_id":"2"}}
{"title":"Spring Boot Integration","content":"How to integrate ES with Spring Boot","tags":["spring","integration"]}// 批量更新
POST /_bulk
{"update":{"_index":"my_index","_id":"1"}}
{"doc":{"views":100,"updated_at":"2023-12-01"}}
{"update":{"_index":"my_index","_id":"2"}}
{"doc":{"views":50,"updated_at":"2023-12-01"}}

重难点解析

1. 分片策略

// 分片数量计算
// 分片数 = 数据量 / 单个分片大小(建议30-50GB)
// 分片数 = CPU核心数 * 2// 分片键选择
PUT /logs/_settings
{"index.routing.allocation.require.box_type": "hot"
}// 分片预热
POST /logs/_forcemerge?max_num_segments=1

2. 性能优化

// 查询优化
GET /my_index/_search
{"query": {"constant_score": {"filter": {"term": {"status": "active"}}}},"_source": ["title", "content"], // 只返回需要的字段"size": 1000 // 避免深度分页
}// 索引优化
PUT /my_index/_settings
{"index.refresh_interval": "30s", // 降低刷新频率"index.number_of_replicas": 0,   // 写入时减少副本"index.translog.durability": "async" // 异步事务日志
}

3. 集群管理

// 集群健康检查
GET /_cluster/health?pretty// 节点信息
GET /_nodes/stats?pretty// 索引统计
GET /_stats?pretty// 分片分配
PUT /_cluster/settings
{"persistent": {"cluster.routing.allocation.enable": "all"}
}

4. 数据建模

// 父子关系建模
PUT /company
{"mappings": {"properties": {"name": { "type": "text" },"join_field": {"type": "join","relations": {"company": "employee"}}}}
}// 嵌套对象
PUT /products
{"mappings": {"properties": {"name": { "type": "text" },"variants": {"type": "nested","properties": {"color": { "type": "keyword" },"size": { "type": "keyword" },"price": { "type": "double" }}}}}
}

Spring Boot集成

1. 依赖配置

<!-- Maven -->
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
// Gradle
implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'

2. 配置文件

# application.yml
spring:elasticsearch:uris: http://localhost:9200connection-timeout: 1ssocket-timeout: 30s# 安全配置# username: elastic# password: changeme# 集群配置# uris: http://es-node1:9200,http://es-node2:9200,http://es-node3:9200

3. 实体类定义

@Document(indexName = "articles")
@Setting(settingPath = "es-settings.json")
public class Article {@Idprivate String id;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String title;@Field(type = FieldType.Text, analyzer = "ik_max_word")private String content;@Field(type = FieldType.Keyword)private List<String> tags;@Field(type = FieldType.Keyword)private String category;@Field(type = FieldType.Date)private LocalDateTime createdAt;@Field(type = FieldType.Long)private Long views;// 构造函数、getter、setter
}

4. Repository接口

@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article, String> {// 自定义查询方法List<Article> findByTitleContaining(String title);List<Article> findByCategoryAndTagsIn(String category, List<String> tags);Page<Article> findByCreatedAtBetween(LocalDateTime start, LocalDateTime end, Pageable pageable);// 使用@Query注解@Query("{\"bool\": {\"must\": [{\"match\": {\"title\": \"?0\"}}]}}")List<Article> searchByTitle(String title);// 聚合查询@Query("{\"aggs\": {\"category_count\": {\"terms\": {\"field\": \"category\"}}}}")SearchHits<Article> getCategoryStats();
}

5. 服务层实现

@Service
public class ArticleService {@Autowiredprivate ArticleRepository articleRepository;@Autowiredprivate ElasticsearchRestTemplate elasticsearchTemplate;// 基本CRUD操作public Article createArticle(Article article) {article.setCreatedAt(LocalDateTime.now());article.setViews(0L);return articleRepository.save(article);}public Article findById(String id) {return articleRepository.findById(id).orElseThrow(() -> new ArticleNotFoundException("Article not found"));}// 复杂搜索查询public SearchPage<Article> searchArticles(SearchRequest request, Pageable pageable) {BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();// 标题和内容搜索if (StringUtils.hasText(request.getKeyword())) {queryBuilder.must(QueryBuilders.multiMatchQuery(request.getKeyword()).field("title", 2.0f).field("content").type(MultiMatchQueryBuilder.Type.BEST_FIELDS).fuzziness(Fuzziness.AUTO));}// 分类过滤if (StringUtils.hasText(request.getCategory())) {queryBuilder.filter(QueryBuilders.termQuery("category", request.getCategory()));}// 标签过滤if (request.getTags() != null && !request.getTags().isEmpty()) {queryBuilder.filter(QueryBuilders.termsQuery("tags", request.getTags()));}// 时间范围过滤if (request.getStartDate() != null && request.getEndDate() != null) {queryBuilder.filter(QueryBuilders.rangeQuery("createdAt").gte(request.getStartDate()).lte(request.getEndDate()));}// 创建搜索查询NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(queryBuilder).withSort(Sort.by(Sort.Direction.DESC, "_score")).withSort(Sort.by(Sort.Direction.DESC, "createdAt")).withPageable(pageable).build();return elasticsearchTemplate.search(searchQuery, Article.class);}// 聚合分析public Map<String, Object> getArticleStats() {// 分类统计TermsAggregationBuilder categoryAgg = AggregationBuilders.terms("category_stats").field("category").size(10);// 标签统计TermsAggregationBuilder tagAgg = AggregationBuilders.terms("tag_stats").field("tags").size(20);// 时间趋势DateHistogramAggregationBuilder timeAgg = AggregationBuilders.dateHistogram("time_trend").field("createdAt").calendarInterval(DateHistogramInterval.MONTH);NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(QueryBuilders.matchAllQuery()).addAggregation(categoryAgg).addAggregation(tagAgg).addAggregation(timeAgg).build();SearchHits<Article> searchHits = elasticsearchTemplate.search(searchQuery, Article.class);Map<String, Object> stats = new HashMap<>();stats.put("category_stats", searchHits.getAggregations().get("category_stats"));stats.put("tag_stats", searchHits.getAggregations().get("tag_stats"));stats.put("time_trend", searchHits.getAggregations().get("time_trend"));return stats;}// 批量操作@Transactionalpublic void bulkIndexArticles(List<Article> articles) {BulkOperations bulkOps = elasticsearchTemplate.bulkOps(BulkOperations.BulkMode.INDEX, Article.class);articles.forEach(article -> {article.setCreatedAt(LocalDateTime.now());article.setViews(0L);bulkOps.insert(article);});bulkOps.execute();}// 更新文档public void updateArticleViews(String id) {UpdateQuery updateQuery = UpdateQuery.builder(id).withScript(new Script(ScriptType.INLINE, "painless", "ctx._source.views += 1", Collections.emptyMap())).build();elasticsearchTemplate.update(updateQuery, IndexCoordinates.of("articles"));}
}

6. 控制器层

@RestController
@RequestMapping("/api/articles")
public class ArticleController {@Autowiredprivate ArticleService articleService;@PostMappingpublic ResponseEntity<Article> createArticle(@RequestBody Article article) {Article createdArticle = articleService.createArticle(article);return ResponseEntity.status(HttpStatus.CREATED).body(createdArticle);}@GetMapping("/{id}")public ResponseEntity<Article> getArticleById(@PathVariable String id) {Article article = articleService.findById(id);return ResponseEntity.ok(article);}@GetMapping("/search")public ResponseEntity<SearchPage<Article>> searchArticles(@ModelAttribute SearchRequest request,@PageableDefault(sort = "createdAt", direction = Sort.Direction.DESC) Pageable pageable) {SearchPage<Article> articles = articleService.searchArticles(request, pageable);return ResponseEntity.ok(articles);}@GetMapping("/stats")public ResponseEntity<Map<String, Object>> getArticleStats() {Map<String, Object> stats = articleService.getArticleStats();return ResponseEntity.ok(stats);}@PostMapping("/bulk")public ResponseEntity<Void> bulkIndexArticles(@RequestBody List<Article> articles) {articleService.bulkIndexArticles(articles);return ResponseEntity.ok().build();}@PutMapping("/{id}/views")public ResponseEntity<Void> incrementViews(@PathVariable String id) {articleService.updateArticleViews(id);return ResponseEntity.ok().build();}
}

7. 配置类

@Configuration
@EnableElasticsearchRepositories(basePackages = "com.example.repository")
public class ElasticsearchConfig extends AbstractElasticsearchConfiguration {@Value("${spring.elasticsearch.uris}")private String elasticsearchUrl;@Override@Beanpublic RestHighLevelClient elasticsearchClient() {ClientConfiguration clientConfiguration = ClientConfiguration.builder().connectedTo(elasticsearchUrl.replace("http://", "")).withConnectTimeout(Duration.ofSeconds(5)).withSocketTimeout(Duration.ofSeconds(30)).build();return RestClients.create(clientConfiguration).rest();}@Beanpublic ElasticsearchRestTemplate elasticsearchRestTemplate() {return new ElasticsearchRestTemplate(elasticsearchClient());}// 自定义转换器@Beanpublic ElasticsearchCustomConversions customConversions() {return new ElasticsearchCustomConversions(Arrays.asList(new LocalDateTimeToDateConverter(),new DateToLocalDateTimeConverter()));}
}

最佳实践

1. 索引设计

  • 分片数量: 根据数据量和硬件资源合理设置
  • 副本数量: 生产环境至少1个副本
  • 映射设计: 合理选择字段类型和分析器
  • 索引生命周期: 使用ILM管理索引生命周期

2. 查询优化

  • 使用filter context减少评分计算
  • 合理使用_source字段减少网络传输
  • 避免深度分页,使用search_after
  • 使用聚合替代应用层计算

3. 性能调优

  • 调整refresh_interval平衡实时性和性能
  • 使用bulk API进行批量操作
  • 合理设置分片大小(30-50GB)
  • 监控集群健康状态

4. 安全配置

  • 启用X-Pack安全功能
  • 配置TLS/SSL加密
  • 设置用户权限和角色
  • 定期备份数据

5. 监控和维护

  • 监控集群健康状态
  • 设置告警机制
  • 定期清理无用索引
  • 监控查询性能

总结

Elasticsearch作为强大的搜索引擎和分析平台,具有分布式架构、实时搜索、强大的聚合分析等特性。通过合理的索引设计、查询优化和Spring Boot集成,可以构建高性能的搜索和分析应用。

关键要点:

  1. 理解Elasticsearch的分布式架构和核心概念
  2. 掌握索引设计和映射配置
  3. 熟练使用查询DSL和聚合API
  4. 正确配置Spring Boot集成
  5. 遵循最佳实践确保性能和稳定性

通过本指南的学习,您应该能够熟练使用Elasticsearch并成功集成到Spring Boot项目中,构建强大的搜索和分析功能。

http://www.dtcms.com/a/343930.html

相关文章:

  • 学习:uniapp全栈微信小程序vue3后台(3)
  • 嵌入式Linux学习 -- 网络1
  • StarRocks启动失败——修复全流程
  • 姓名重名查询抖音快手微信小程序看广告流量主开源
  • 恢复性测试:定义、重要性及实施方法
  • Linux设备模型交互机制详细分析
  • 分段渲染加载页面
  • 第9课:本地功能集成
  • 宋红康 JVM 笔记 Day06|虚拟机栈
  • Seaborn数据可视化实战:Seaborn数据可视化基础-从内置数据集到外部数据集的应用
  • 学习游戏制作记录(合成表UI和技能树的UI)8.22
  • Python打卡Day49 CBAM注意力
  • 小迪安全v2023学习笔记(六十九讲)—— Java安全JWT攻防监控组件泄露接口
  • 北斗导航 | 基于MCMC粒子滤波的接收机自主完好性监测(RAIM)算法(附matlab代码)
  • 【C++组件】Elasticsearch 安装及使用
  • ODYSSEY:开放世界四足机器人的探索与操控,助力长范围任务
  • ref 简单讲解
  • 【前端教程】从基础到进阶:淘宝 HTML 界面“回到顶部”功能的交互升级实战
  • 刷题日记0822
  • Git 版本管理各模块知识点梳理
  • Logstash_Input插件
  • Chrome和Edge如何开启暗黑模式
  • 浏览器插件优化工具:bypass paywalls chrome
  • 【TrOCR】根据任务特性设计词表vocab.json
  • 今日科技热点 | NVIDIA AI芯片、5G加速与大数据平台演进——技术驱动未来
  • ESP32C5在espidf环境下报错5g bitmap contains only invalid channels= @xff
  • 龙虎榜——20250822
  • 线上日志排查问题
  • docker 查看容器 docker 筛选容器
  • 使用 Ragas 评估你的 Elasticsearch LLM 应用