Elasticsearch知识汇总之ElasticSearch监控方案
八 ElasticSearch监控方案
8.1 ElasticSearch监控指标
监控指标为磐基生产项指标,以下‘监控项名称’‘指标名称 ‘使用的公式‘都已详细说明,图表如下:
监控项名称 | 指标英文名称 | 使用的公式 |
elasticsearch集群健康状态 | Elastic_Cluster_Health | elasticsearch_cluster_health_status{job="$job",instance=~"$instance",cluster="$cluster",color="red"}==1 or (elasticsearch_cluster_health_status{job="$job",instance=~"$instance",cluster="$cluster",color="green"}==1)+4 or (elasticsearch_cluster_health_status{job="$job",instance=~"$instance",cluster="$cluster",color="yellow"}==1)+22 |
elasticsearch集群健康节点数 | elasticsearch_cluster_health_number_of_nodes | elasticsearch_cluster_health_number_of_nodes{job="$job",instance=~"$instance",cluster="$cluster"} |
elasticsearch集群健康数据节点数 | elasticsearch_cluster_health_number_of_data_nodes | elasticsearch_cluster_health_number_of_data_nodes{job="$job",instance=~"$instance",cluster="$cluster"} |
elasticsearch的jvm内存使用 | elasticsearch_jvm_memory_used | elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"} |
elasticsearch的CPU使用率 | elasticsearch_process_cpu_percent | elasticsearch_process_cpu_percent{} |
elasticsearch磁盘空间使用率 | elasticsearch_filesystem_data_used_percent | 100 * (elasticsearch_filesystem_data_size_bytes - elasticsearch_filesystem_data_free_bytes) / elasticsearch_filesystem_data_size_bytes |
elasticsearch的负载 | elasticsearch_os_load | elasticsearch_os_load1{job="$job",instance=~"$instance",cluster="$cluster",name=~"$name"} |
elasticsearch集群未分配片分片状态 | elasticsearch_cluster_health_unassigned_shards | elasticsearch_cluster_health_unassigned_shards{} |
elasticsearch集群阻塞的任务数 | elasticsearch_cluster_health_number_of_pending_tasks | elasticsearch_cluster_health_number_of_pending_tasks{} |
elasticsearch线程池中被拒绝的线程数 | elasticsearch_thread_pool_rejected_count | rate(elasticsearch_thread_pool_rejected_count{type!="management"}[5m]) |
elasticsearch打开文件描述符数 | elasticsearch_process_open_files_count | elasticsearch_process_open_files_count/elasticsearch_process_max_files_descriptors * 100 |
elasticsearch线程池活跃的线程数 | elasticsearch_thread_pool_queue_count | elasticsearch_thread_pool_active_count{} |
elasticsearch JVM GC 垃圾搜集数 | elasticsearch_jvm_gc_collection_seconds_count | irate(elasticsearch_jvm_gc_collection_seconds_count{}[5]) |
elasticsearch线程池完成的线程数 | elasticsearch_thread_pool_completed_count | irate(elasticsearch_thread_pool_completed_count{}[5]) |
8.2 Prometheus监控
我们可以通过prometheus来取得他的监控指标,如图:截取部分图指标展示:
8.3 Grafana展示
通过grafana来对它的指标进行监控,如图所示: