一键部署 Prometheus + Grafana + Alertmanager 教程(使用 Docker Compose)
1. 安装前准备
确保你已安装以下组件:
# 安装 Docker
curl -fsSL https://get.docker.com | bash# 安装 Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" \-o /usr/local/bin/docker-composesudo chmod +x /usr/local/bin/docker-compose
docker-compose --version
2. 创建项目目录结构
mkdir -p ~/prometheus-stack
cd ~/prometheus-stackmkdir -p prometheus
mkdir -p alertmanager
3. 创建配置文件
nano docker-compose.yml
粘贴以下内容:
version: '3.3'services:prometheus:image: prom/prometheuscontainer_name: prometheusports:- "9090:9090"volumes:- ./prometheus.yml:/etc/prometheus/prometheus.yml- ./prometheus:/etc/prometheus/rulesrestart: unless-stoppedgrafana:image: grafana/grafana-osscontainer_name: grafanaports:- "3001:3000"environment:- GF_SECURITY_ADMIN_USER=admin- GF_SECURITY_ADMIN_PASSWORD=adminvolumes:- grafana-storage:/var/lib/grafanarestart: unless-stoppedalertmanager:image: prom/alertmanagercontainer_name: alertmanagerports:- "9093:9093"volumes:- ./alertmanager.yml:/etc/alertmanager/alertmanager.ymlcommand:- '--config.file=/etc/alertmanager/alertmanager.yml'restart: unless-stoppedvolumes:grafana-storage:
nano prometheus.yml
内容如下
# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- "xxxxx:9093"# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:- "/etc/prometheus/rules/*.yml"# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"- job_name: "agent_windows"static_configs:- targets: ["xxxx:9182"]labels:app: "windows"instance: "虚拟机"- job_name: "alertmanager"static_configs:- targets: ["localhost:9093"]labels:app: "alertmanager"- job_name: "node_exporter"static_configs:- targets: ["xxxx:9100"]labels:app: "node_exporter"instance: "阿里云测试服务器"- job_name: "process_exporter"static_configs:- targets: ["xxxx:9256"]labels:app: "process"instance: "阿里云测试服务器"- job_name: 'http_probe'metrics_path: /probeparams:module: [http_2xx]static_configs:- targets:- https:XXXXXXlabels:instance: "公司主页"app: webapp- targets:- http://xxxxxxxx:9997/docs#/labels:instance: "PDF合并"app: pdfservicerelabel_configs:- source_labels: [__address__]target_label: __param_target#- source_labels: [__param_target]# target_label: url # 保留 static_configs 里的 instance 标签,不覆盖# 也可以注释掉下一行,避免覆盖instance标签# - source_labels: [__param_target]# target_label: instance- target_label: __address__replacement: xxxxxxx:9115
nano alertmanager.yml
route:group_by: ['alertname'] # 按 alertname 标签分组告警(相同告警合并通知)group_wait: 30s # 第一次告警延迟30秒再发送,防止太快触发group_interval: 5m # 同一组告警发送间隔至少5分钟(防止频繁通知)repeat_interval: 1h # 告警持续存在,重复通知间隔1小时receiver: 'webhook_receiver' # 默认发送接收器名称receivers:- name: 'webhook_receiver'webhook_configs:- url: 'http://xxxx:5012/alertmanager_to_feishu'send_resolved: true inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']
剩下的配置文件就可以添加到prometheus/文件夹下
4. 启动服务
docker-compose up -d