【Linux运维】 Prometheus + Grafana + Alertmanager 监控系统部署指南(CentOS Ubuntu 通用版)
Prometheus + Grafana + Alertmanager 监控系统部署指南(CentOS & Ubuntu 通用版)
作者:爱技术的小伙子
发布时间:2025-10-15
摘要:本文详细介绍在CentOS 7/8 或 Ubuntu 18.04+ 系统上部署 Prometheus + Grafana + Alertmanager 监控栈的步骤。适用于测试环境(如2台虚拟机),后续可扩展到48台服务器。Prometheus 用于指标收集,Grafana 用于可视化,Alertmanager 用于警报管理。假设一台VM作为中心服务器(VM1),另一台作为被监控节点(VM2)。
适用对象:运维工程师、DevOps 初学者。
前提:两台机器网络连通,sudo权限。使用最新版本(示例:Prometheus 2.54.0、Node Exporter 1.8.2、Alertmanager 0.26.0;实际下载时替换为官网最新)。
参考:Prometheus 官方文档、Grafana 官网。
关键词:Prometheus, Grafana, Alertmanager, 监控部署, CentOS, Ubuntu
1. 环境准备
在 VM1(中心服务器) 和 VM2(被监控节点) 上执行。
使用变量区分命令:
- CentOS:使用
yum
或dnf
(CentOS 8+ 用 dnf),firewalld 防火墙。 - Ubuntu:使用
apt
,ufw 防火墙。
1.1 系统更新与依赖安装
- CentOS:
sudo yum update -y # 或 sudo dnf update -y (CentOS 8+) sudo yum install -y wget tar curl # 或 dnf 等
- Ubuntu:
sudo apt update && sudo apt upgrade -y sudo apt install -y wget tar curl
1.2 创建专用用户
sudo useradd --no-create-home --shell /bin/false prometheus
1.3 防火墙配置
- CentOS(firewalld):
sudo systemctl start firewalld sudo systemctl enable firewalld # VM1 端口 sudo firewall-cmd --permanent --add-port=9090/tcp # Prometheus sudo firewall-cmd --permanent --add-port=9093/tcp # Alertmanager sudo firewall-cmd --permanent --add-port=3000/tcp # Grafana # VM2 端口 sudo firewall-cmd --permanent --add-port=9100/tcp # Node Exporter sudo firewall-cmd --reload
- Ubuntu(ufw):
sudo ufw allow 9090/tcp # VM1 sudo ufw allow 9093/tcp # VM1 sudo ufw allow 3000/tcp # VM1 sudo ufw allow 9100/tcp # VM2 sudo ufw reload
注意:确保 VM1 能访问 VM2 的 9100 端口(测试:
curl <VM2_IP>:9100
)。
2. 在 VM2 上安装 Node Exporter(指标导出器)
Node Exporter 收集系统指标(如 CPU、内存)。
-
下载并安装:
cd /tmp wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz tar xvfz node_exporter-1.8.2.linux-amd64.tar.gz sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/node_exporter
-
创建 systemd 服务:
sudo nano /etc/systemd/system/node-exporter.service
内容:
[Unit] Description=Node Exporter After=network.target[Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/node_exporter[Install] WantedBy=multi-user.target
-
启动服务:
sudo systemctl daemon-reload sudo systemctl start node-exporter sudo systemctl enable node-exporter sudo systemctl status node-exporter
-
测试:
curl http://localhost:9100/metrics
(应返回指标数据)。
3. 在 VM1 上安装 Prometheus(核心收集器)
-
创建目录:
sudo mkdir /etc/prometheus /var/lib/prometheus sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
-
下载并安装:
cd /tmp wget https://github.com/prometheus/prometheus/releases/download/v2.54.0/prometheus-2.54.0.linux-amd64.tar.gz tar xvfz prometheus-2.54.0.linux-amd64.tar.gz sudo mv prometheus-2.54.0.linux-amd64/prometheus /usr/local/bin/ sudo mv prometheus-2.54.0.linux-amd64/promtool /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool sudo mv prometheus-2.54.0.linux-amd64/consoles /etc/prometheus/ sudo mv prometheus-2.54.0.linux-amd64/console_libraries /etc/prometheus/ sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries
-
配置
prometheus.yml
:sudo nano /etc/prometheus/prometheus.yml
内容(替换
<VM2_IP>
):global:scrape_interval: 15sevaluation_interval: 15salerting:alertmanagers:- static_configs:- targets: ["localhost:9093"]rule_files:- "/etc/prometheus/*.rules.yml"scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]- job_name: "node"static_configs:- targets: ["localhost:9100", "<VM2_IP>:9100"] # 添加更多节点时扩展这里
Tip:VM1 若需自监控,也安装 Node Exporter。
-
创建 systemd 服务:
sudo nano /etc/systemd/system/prometheus.service
内容:
[Unit] Description=Prometheus After=network.target[Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \--config.file /etc/prometheus/prometheus.yml \--storage.tsdb.path /var/lib/prometheus/ \--web.console.templates=/etc/prometheus/consoles \--web.console.libraries=/etc/prometheus/console_libraries ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure[Install] WantedBy=multi-user.target
-
启动服务:
sudo systemctl daemon-reload sudo systemctl start prometheus sudo systemctl enable prometheus sudo systemctl status prometheus
-
测试:浏览器访问
http://<VM1_IP>:9090
,检查 Targets 页面(状态 UP)。
4. 在 VM1 上安装 Alertmanager(警报管理)
-
下载并安装:
cd /tmp wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/ sudo mv alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/alertmanager /usr/local/bin/amtool sudo mkdir /etc/alertmanager /var/lib/alertmanager sudo chown -R prometheus:prometheus /etc/alertmanager /var/lib/alertmanager sudo mv alertmanager-0.26.0.linux-amd64/alertmanager.yml /etc/alertmanager/
-
配置
alertmanager.yml
(示例 email,替换细节;可加 Slack):sudo nano /etc/alertmanager/alertmanager.yml
内容:
global:resolve_timeout: 1msmtp_smarthost: 'smtp.example.com:587'smtp_from: 'alert@example.com'smtp_auth_username: 'username'smtp_auth_password: 'password'route:receiver: 'email-notifications'receivers:- name: 'email-notifications'email_configs:- to: 'your@email.com'send_resolved: true
-
创建 systemd 服务:
sudo nano /etc/systemd/system/alertmanager.service
内容:
[Unit] Description=Alertmanager After=network.target[Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/alertmanager \--config.file /etc/alertmanager/alertmanager.yml \--storage.path /var/lib/alertmanager/[Install] WantedBy=multi-user.target
-
启动服务:
sudo systemctl daemon-reload sudo systemctl start alertmanager sudo systemctl enable alertmanager sudo systemctl status alertmanager
-
测试:访问
http://<VM1_IP>:9093
。
5. 在 VM1 上安装 Grafana(可视化仪表板)
-
添加仓库并安装:
- CentOS:
sudo nano /etc/yum.repos.d/grafana.repo # 内容: [grafana] name=grafana baseurl=https://packages.grafana.com/oss/rpm repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://packages.grafana.com/gpg.key sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crtsudo yum install grafana -y # 或 dnf
- Ubuntu:
sudo apt install -y apt-transport-https software-properties-common wget wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list sudo apt update sudo apt install grafana -y
- CentOS:
-
启动服务:
sudo systemctl daemon-reload sudo systemctl start grafana-server sudo systemctl enable grafana-server sudo systemctl status grafana-server
-
配置数据源:浏览器访问
http://<VM1_IP>:3000
(默认 admin/admin,首次改密)。- Add data source > Prometheus > URL:
http://localhost:9090
> Save & Test。
- Add data source > Prometheus > URL:
-
导入仪表板:Dashboards > Import > ID 1860(Node Exporter Full),选择 Prometheus 源。
6. 配置警报规则
-
创建规则文件:
sudo nano /etc/prometheus/cpu.rules.yml sudo chown prometheus:prometheus /etc/prometheus/cpu.rules.yml
内容(高 CPU 示例):
groups:- name: cpu-alertsrules:- alert: HighCPUUsageexpr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80for: 1mlabels:severity: criticalannotations:summary: "High CPU on {{ $labels.instance }}"
-
重启 Prometheus:
sudo systemctl restart prometheus
-
测试警报:在 VM2 安装 stress(CentOS:
sudo yum install -y epel-release && sudo yum install -y stress
;Ubuntu:sudo apt install -y stress
),运行stress --cpu 4 --timeout 120s
。检查 Prometheus Alerts 和 Alertmanager。
7. 验证、扩展与故障排除
- 验证:Grafana 查看图表;PromQL 查询如
node_cpu_seconds_total
。 - 扩展到 48 台:每台安装 Node Exporter,在 prometheus.yml 添加 targets 或用服务发现(file_sd/consul)。
- 常见问题:
- SELinux(CentOS):
sudo setenforce 0
临时禁用。 - 日志:
journalctl -u prometheus
或/var/log/messages
(CentOS)//var/log/syslog
(Ubuntu)。 - CentOS 7 EOL:建议迁移到 Rocky Linux。
- SELinux(CentOS):
- 安全建议:生产环境加 HTTPS、认证(Grafana OAuth)、备份数据。
8. 结语
本部署完成后,您有完整的监控栈:指标收集、可视化、警报。适合中小规模环境,扩展性强。遇到问题欢迎评论区讨论!如果需要 Docker/K8s 版本或更多 Exporter(如 MySQL),可继续扩展。
点赞+收藏+关注,更多运维干货后续更新!
参考链接:
- Prometheus: https://prometheus.io/docs/introduction/overview/
- Grafana: https://grafana.com/docs/grafana/latest/
- GitHub 下载: https://github.com/prometheus