当前位置: 首页 > news >正文

@Docker Compose 部署 Prometheus

文章目录

      • Docker Compose 部署 Prometheus
        • 1. 环境准备
        • 2. 配置文件准备
        • 3. 编写 Docker Compose 文件
        • 4. 启动服务
        • 5. 验证部署
        • 6. 常用操作
        • 7. 生产环境增强建议
        • 8. 扩展监控对象

Docker Compose 部署 Prometheus

1. 环境准备
  • 安装 Docker(版本 ≥ 20.10)和 Docker Compose(版本 ≥ 1.29)
  • 创建项目目录:
    mkdir prometheus && cd prometheus
    
2. 配置文件准备
  • 创建 Prometheus 配置文件
    prometheus.yml(基础配置):

    global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]  # 监控自身# 示例:添加 Node Exporter(需额外部署)# - job_name: "node"#   static_configs:#     - targets: ["node-exporter:9100"]
    
  • 创建告警规则文件(可选)
    alerts.yml

    groups:
    - name: examplerules:- alert: InstanceDownexpr: up == 0for: 1mlabels:severity: criticalannotations:summary: "Instance {{ $labels.instance }} down"
    

    linux_rules.yml

    groups:
    - name: linux-system-rulesrules:# CPU 相关规则- alert: HighCpuLoadexpr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80for: 10mlabels:severity: warningannotations:summary: "High CPU load on {{ $labels.instance }}"description: "CPU usage is {{ $value }}% for last 10 minutes"# 内存相关规则- alert: HighMemoryUsageexpr: (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100 > 5  # 修改测试触发告警for: 10mlabels:severity: warningannotations:summary: "High memory usage on {{ $labels.instance }}"description: "Memory usage is {{ $value }}% for last 10 minutes"# 交换分区规则- alert: HighSwapUsageexpr: (node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes * 100 > 50for: 15mlabels:severity: warningannotations:summary: "High swap usage on {{ $labels.instance }}"description: "Swap usage is {{ $value }}% for last 15 minutes"# 磁盘空间规则- alert: LowDiskSpaceexpr: (node_filesystem_avail_bytes{mountpoint!~"^(/run|/var/lib/docker).*",fstype!="tmpfs"} / node_filesystem_size_bytes * 100) < 15for: 10mlabels:severity: warningannotations:summary: "Low disk space on {{ $labels.instance }} ({{ $labels.mountpoint }})"description: "Only {{ $value }}% free space left on {{ $labels.mountpoint }}"# 磁盘 I/O 规则- alert: HighDiskIoLoadexpr: rate(node_disk_io_time_seconds_total[1m]) * 100 > 80for: 10mlabels:severity: warningannotations:summary: "High disk I/O load on {{ $labels.instance }} ({{ $labels.device }})"description: "Disk I/O load is {{ $value }}% for last 10 minutes"# 网络相关规则- alert: HighNetworkErrorsexpr: increase(node_network_receive_errs_total[5m]) > 10 or increase(node_network_transmit_errs_total[5m]) > 10for: 5mlabels:severity: warningannotations:summary: "High network errors on {{ $labels.instance }} ({{ $labels.device }})"description: "Network errors detected on interface {{ $labels.device }}"# 系统负载规则- alert: HighSystemLoadexpr: node_load5 / count by(instance)(node_cpu_seconds_total{mode="system"}) > 1.5for: 15mlabels:severity: warningannotations:summary: "High system load on {{ $labels.instance }}"description: "5-minute load average is {{ $value }} (relative to CPU count)"# 节点宕机规则- alert: InstanceDownexpr: up{job="node"} == 0for: 5mlabels:severity: criticalannotations:summary: "Instance {{ $labels.instance }} down"description: "{{ $labels.instance }} has been down for more than 5 minutes"# 文件描述符规则- alert: HighFileDescriptorUsageexpr: node_filefd_allocated / node_filefd_maximum * 100 > 80for: 10mlabels:severity: warningannotations:summary: "High file descriptor usage on {{ $labels.instance }}"description: "File descriptor usage is {{ $value }}% of maximum"

    windows_rules.yml

    groups:
    - name: windows-system-rulesrules:# CPU 相关规则- alert: HighCpuUsageWindowsexpr: 100 - (avg by(instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100) > 85for: 10mlabels:severity: warningannotations:summary: "High CPU usage on {{ $labels.instance }}"description: "CPU usage is {{ $value }}% for last 10 minutes"# 内存相关规则- alert: HighMemoryUsageWindowsexpr: (windows_os_physical_memory_total_bytes - windows_os_physical_memory_free_bytes) / windows_os_physical_memory_total_bytes * 100 > 90for: 10mlabels:severity: warningannotations:summary: "High memory usage on {{ $labels.instance }}"description: "Memory usage is {{ $value }}% for last 10 minutes"# 磁盘空间规则- alert: LowDiskSpaceWindowsexpr: (windows_logical_disk_free_bytes / windows_logical_disk_size_bytes * 100) < 95  # 修改测试触发告警for: 10mlabels:severity: warningannotations:summary: "Low disk space on {{ $labels.instance }} ({{ $labels.volume }})"description: "Only {{ $value }}% free space left on {{ $labels.volume }}"# 磁盘 I/O 规则- alert: HighDiskIoWindowsexpr: rate(windows_logical_disk_read_seconds_total[5m]) * 100 > 80 or rate(windows_logical_disk_write_seconds_total[5m]) * 100 > 80for: 10mlabels:severity: warningannotations:summary: "High disk I/O on {{ $labels.instance }} ({{ $labels.volume }})"description: "Disk I/O utilization is {{ $value }}% for last 10 minutes"# 服务状态规则- alert: CriticalServiceDownexpr: windows_service_status{status!="running"} == 1for: 2mlabels:severity: criticalannotations:summary: "Critical service down on {{ $labels.instance }}"description: "Service {{ $labels.service }} is not running"# 系统启动时间规则- alert: SystemRebootedexpr: time() - windows_system_system_up_time > 300for: 0mlabels:severity: infoannotations:summary: "System rebooted on {{ $labels.instance }}"description: "System was rebooted, uptime is {{ $value }} seconds"# 网络连接规则- alert: HighNetworkUtilizationWindowsexpr: rate(windows_net_bytes_total[5m]) / windows_net_speed_bits * 8 * 100 > 80for: 10mlabels:severity: warningannotations:summary: "High network utilization on {{ $labels.instance }} ({{ $labels.interface }})"description: "Network utilization is {{ $value }}% for last 10 minutes"# 进程内存泄漏检测- alert: ProcessMemoryLeakWindowsexpr: predict_linear(windows_process_private_bytes[1h], 3600) / 1024 / 1024 / 1024 > 2for: 30mlabels:severity: warningannotations:summary: "Possible memory leak in {{ $labels.process }} on {{ $labels.instance }}"description: "Process {{ $labels.process }} is predicted to exceed 2GB memory in 1 hour"# 系统日志错误规则- alert: SystemLogErrorsWindowsexpr: rate(windows_event_log_errors_total[5m]) > 5for: 5mlabels:severity: warningannotations:summary: "High system log errors on {{ $labels.instance }}"description: "{{ $value }} errors per second in system logs"

    linux_recording_rules.yml

    groups:
    - name: linux-recording-rulesinterval: 1mrules:# CPU Usage (兼容多版本Node Exporter)- record: instance:node_cpu_usage:rate5mexpr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",job=~".*"}[5m])) * 100)# Memory Usage (排除缓存/缓冲区)- record: instance:node_memory_usage:ratioexpr: >(node_memory_MemTotal_bytes - node_memory_MemFree_bytes- node_memory_Buffers_bytes - node_memory_Cached_bytes)/ node_memory_MemTotal_bytes * 100# Disk Space Usage (过滤无效挂载点)- record: instance:node_filesystem_usage:ratioexpr: >(node_filesystem_size_bytes{fstype!~"tmpfs|squashfs",mountpoint!~"/run|/snap"}- node_filesystem_avail_bytes{fstype!~"tmpfs|squashfs",mountpoint!~"/run|/snap"})/ node_filesystem_size_bytes{fstype!~"tmpfs|squashfs",mountpoint!~"/run|/snap"} * 100# Network Traffic (过滤虚拟接口)- record: instance:node_network_receive_mbps:rate5mexpr: sum by(instance)(rate(node_network_receive_bytes_total{device!~"lo|veth.*"}[5m])) * 8 / 1048576# System Load (标准化)- record: instance:node_load_ratio:rate5mexpr: node_load5 / count by(instance)(node_cpu_seconds_total{mode="system"})
3. 编写 Docker Compose 文件

docker-compose.yml

version: '3.8'services:prometheus:image: prom/prometheus:latestcontainer_name: prometheusvolumes:- ./prometheus.yml:/etc/prometheus/prometheus.yml- ./alerts.yml:/etc/prometheus/alerts.yml  # 挂载告警规则- prometheus-data:/prometheus  # 数据持久化command:- '--config.file=/etc/prometheus/prometheus.yml'- '--storage.tsdb.path=/prometheus'- '--web.enable-lifecycle'  # 允许热重载配置ports:- "9090:9090"restart: unless-stoppednetworks:- monitor-net# 可选:添加 Grafana 可视化grafana:image: grafana/grafana:latestcontainer_name: grafanavolumes:- grafana-data:/var/lib/grafanaports:- "3000:3000"restart: unless-stoppednetworks:- monitor-net# 可选:添加 Node Exporter 监控主机# node-exporter:#   image: prom/node-exporter:latest#   container_name: node-exporter#   restart: unless-stopped#   network_mode: host  # 需主机模式#   pid: host#   volumes:#     - /:/host:ro,rslave#   command:#     - '--path.rootfs=/host'volumes:prometheus-data:grafana-data:networks:monitor-net:driver: bridge
4. 启动服务
docker-compose up -d  # 后台启动
5. 验证部署
  • Prometheus UI:访问 http://<服务器IP>:9090
    • 检查 Targets:Status → Targets
    • 查询指标:Graph → 输入 up 查看状态
  • Grafana UI(如部署):http://<服务器IP>:3000(默认账号 admin/admin)
    • 添加 Prometheus 数据源:http://prometheus:9090
6. 常用操作
  • 重载配置(不重启)
    curl -X POST http://localhost:9090/-/reload
    
  • 查看日志
    docker-compose logs -f prometheus
    
  • 停止服务
    docker-compose down
    
  • 备份数据:备份 prometheus-data 卷(默认位置:/var/lib/docker/volumes/...
7. 生产环境增强建议
  1. 安全加固
    • 设置 Prometheus --web.config.file 启用基础认证
    • 限制 Grafana 登录策略
  2. 持久化优化
    volumes:prometheus-data:driver_opts:type: nfso: addr=<nfs_server>,rwdevice: ":/path/to/nfs"
    
  3. 资源限制
    prometheus:deploy:resources:limits:cpus: '2'memory: 4G
    
  4. 高可用方案
    • 部署多个 Prometheus 实例 + Thanos
    • 使用 Alertmanager 集群
8. 扩展监控对象

修改 prometheus.yml 添加:

# 监控 Docker 容器
- job_name: "docker"static_configs:- targets: ["docker-host:9323"]  # 需配置 docker daemon 暴露 metrics# 监控 MySQL
- job_name: "mysql"static_configs:- targets: ["mysql-exporter:9104"]  # 需部署 mysqld-exporter

:完整配置参考 Prometheus 官方文档

相关文章:

  • Vue3 + VTable 高性能表格组件完全指南,一个基于 Canvas 的高性能表格组件
  • 飞牛fnNAS的Docker应用之迅雷篇
  • 展会聚焦丨漫途科技亮相2025西北水务博览会!
  • Ubuntu系统下可执行文件在桌面单击运行教程
  • 企业信息化集成方案:聚水潭·奇门数据对接金蝶云星空
  • 【Linux 学习计划】-- 进程状态 | 进程运行、阻塞和挂起的本质 | 并行、并发与进程切换 | 进程优先级
  • 重拾Scrapy框架
  • AC220V整流滤波电路Multisim仿真
  • 【python基础知识】列表简介
  • golang 基于redis实现集群中的主实例选举
  • leetcode动态规划—打家劫舍系列
  • 信创国产化
  • vue3实现鼠标悬浮div动画效果
  • espefuse.py烧录MAC地址
  • 若依项目天气模块
  • PHP中文网文章内容提取免费API接口教程
  • Cypress API 中文详解
  • Python基于Django的校园打印预约系统(附源码,文档说明)
  • LangChain实战:MMR和相似性搜索技术应用
  • 01 redis 的环境搭建
  • 贵州省政府门户网站建设/武汉最新消息今天
  • 旅游类网站建设教案/百度网盘账号登录入口
  • 东莞企业网站推广多少钱/优化设计三年级上册答案
  • 沈阳网络建网站/广告投放方案
  • 网站建设制作视频/金花站长工具
  • 开发一个交易平台需要多少钱/百度优化