当前位置: 首页 > news >正文

通过redis_exporter监控redis cluster

环境说明:

现在有一套redis cluster,部署是3主机6实例架构部署。需要采集对应的指标,满足异常监控告警,性能分析所需。

环境准备

以下环境需要提前部署完成。
redis cluser
prometheus
alertmanager
grafna

redis_exporter部署

我们部署采用docker composer 进行安装。
采用的redis_exporter为:https://github.com/oliver006/redis_exporter

  redis-exporter:image: docker.m.daocloud.io/oliver006/redis_exporter:v1.74.0-alpinecommand:- '--redis.addr=redis://redisIP:7001'- '--redis.password=redisPassword'- '--is-cluster'ports:- "9121:9121"

上面参数,只需要指定--is-cluster,然后指明集群中一个节点,即可获取所有节点的数据。

prometheus采集配置:

添加prometheus的监控项:

  - job_name: 'redis_sjzt_prod'http_sd_configs:- url: http://redisExporterIP:9121/discover-cluster-nodesrefresh_interval: 10mmetrics_path: /scraperelabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: redisExporter:9121

指标查看:
可以看到 prometheus的target中已经存在对应的采集项,并且有集群的所有节点。
在这里插入图片描述

大屏展示:

监控模板:https://grafana.com/grafana/dashboards/763-redis-dashboard-for-prometheus-redis-exporter-1-x/
下载后直接导入,选择对应的数据源即可。
在这里插入图片描述

告警:

现在创建对应的报警规则,实现异常时通知到alertmanager。
使用的告警规则为:https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/redis/oliver006-redis-exporter.yml
下载后,加入到prometheus中。
但是因为我们是集群,所以需要修改一些报警规则实现。删除两个不适用的报警规则RedisTooManyMasters和RedisDisconnectedSlaves 。修改后内容如下:
vim redis.yml

groups:
- name: Oliver006RedisExporterrules:- alert: RedisDownexpr: 'redis_up == 0'for: 0mlabels:severity: criticalannotations:summary: Redis down (instance {{ $labels.instance }})description: "Redis instance is down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisMissingMasterexpr: '(count(redis_instance_info{role="master"}) or vector(0)) < 1'for: 0mlabels:severity: criticalannotations:summary: Redis missing master (instance {{ $labels.instance }})description: "Redis cluster has no node marked as master.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisReplicationBrokenexpr: 'delta(redis_connected_slaves[1m]) < 0'for: 0mlabels:severity: criticalannotations:summary: Redis replication broken (instance {{ $labels.instance }})description: "Redis instance lost a slave\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisClusterFlappingexpr: 'changes(redis_connected_slaves[1m]) > 1'for: 2mlabels:severity: criticalannotations:summary: Redis cluster flapping (instance {{ $labels.instance }})description: "Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisMissingBackupexpr: 'time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24'for: 0mlabels:severity: criticalannotations:summary: Redis missing backup (instance {{ $labels.instance }})description: "Redis has not been backuped for 24 hours\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisOutOfSystemMemoryexpr: 'redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90'for: 2mlabels:severity: warningannotations:summary: Redis out of system memory (instance {{ $labels.instance }})description: "Redis is running out of system memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisOutOfConfiguredMaxmemoryexpr: 'redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90 and on(instance) redis_memory_max_bytes > 0'for: 2mlabels:severity: warningannotations:summary: Redis out of configured maxmemory (instance {{ $labels.instance }})description: "Redis is running out of configured maxmemory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisTooManyConnectionsexpr: 'redis_connected_clients / redis_config_maxclients * 100 > 90'for: 2mlabels:severity: warningannotations:summary: Redis too many connections (instance {{ $labels.instance }})description: "Redis is running out of connections (> 90% used)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisNotEnoughConnectionsexpr: 'redis_connected_clients < 5'for: 2mlabels:severity: warningannotations:summary: Redis not enough connections (instance {{ $labels.instance }})description: "Redis instance should have more connections (> 5)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"- alert: RedisRejectedConnectionsexpr: 'increase(redis_rejected_connections_total[1m]) > 0'for: 0mlabels:severity: criticalannotations:summary: Redis rejected connections (instance {{ $labels.instance }})description: "Some connections to Redis has been rejected\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

重新加载prometheus的配置
curl -X POST http://localhost:9090/-/reload

检查对应的报警项是否已经添加进去了。访问prometheus 点击Alerts。进行查看如下所示:
在这里插入图片描述
说明:监控指标需要按照实际项目需要进行仔细考虑。以上只是参考。

http://www.dtcms.com/a/299865.html

相关文章:

  • 1. Qt多线程开发
  • JavaEE初阶第十一期:解锁多线程,从 “单车道” 到 “高速公路” 的编程升级(九)
  • 第10篇:实战验收篇
  • 无需云服务器的内网穿透方案 -- cloudflare tunnel
  • 特产|基于SSM+vue的南阳特产销售平台(源码+数据库+文档)
  • 如何实现打印功能
  • 大话数据结构之 < 栈>(C语言)
  • Java中mybatis 无参构造器?你会吗
  • Spring AI 项目实战(二十):基于Spring Boot + AI + DeepSeek的智能环境监测与分析平台(附完整源码)
  • 修改site-packages位置与pip配置
  • Kubernetes 与 Docker的爱恨情仇
  • 面试实战,问题十三,Redis在Java项目中的作用及使用场景详解,怎么回答
  • 面试问题总结——关于OpenCV(二)
  • 【电赛学习笔记】MaxiCAM 的OCR图片文字识别
  • 力扣404.左叶子之和
  • jxORM--查询数据
  • ART配对软件使用
  • Macast配置
  • ThreadLocal--ThreadLocal介绍
  • 7.26 cpu
  • 单片机ADC机理层面详细分析(一)
  • SSE (Server-Sent Events) 服务出现连接卡在 pending 状态的原因
  • 嵌入式软硬件开发入门工具推荐
  • `read`系统调用示例
  • java每日精进 7.26【流程设计5.0(中间事件+结束事件)】
  • 检索召回率优化探究一:基于 LangChain 0.3集成 Milvus 2.5向量数据库构建的智能问答系统
  • 全球化2.0 | 云轴科技ZStack亮相阿里云印尼国有企业CXO专家活动
  • FreeMarker模板引擎
  • Windows Server系统安装JDK,一直卡在“应用程序正在为首次使用作准备,请稍候”
  • Vibe Coding | 技术让我们回归了创造的本质