prometheus+grafana搭建
部署 prometheus
安装
# 1,下载
wget https://github.com/prometheus/prometheus/releases/download/v2.45.1/prometheus-3.5.0.linux-amd64.tar.gz# 2,部署
tar -zxvf prometheus-3.5.0.linux-amd64.tar.gz -C /opt/
cd /opt/
mv ./prometheus-3.5.0.linux-amd64 prometheus# 3,验证
[root@prometheus prometheus]# cd /opt/prometheus
[root@prometheus prometheus]# ./prometheus --version
prometheus, version 3.5.0 (branch: HEAD, revision: 8be3a9560fbdd18a94dedec4b747c35178177202)build user: root@4451b64cb451build date: 20250714-16:15:23go version: go1.24.5platform: linux/amd64tags: netgo,builtinassets# 4,配置用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /opt/prometheus/# 5,创建prometheus运行数据目录
mkdir -p /opt/prometheus/data
chown -R prometheus:prometheus /opt/prometheus/data
配置文件
[root@prometheus prometheus]# cat prometheus.yml
# my global config
global:scrape_interval: 15s # 默认15s 全局每次数据收集的间隔 minute.evaluation_interval: 15s # 规则扫描时间间隔是15秒,默认不填写是 1分钟 minute.# scrape_timeout is set to the global default (10s). # 超时时间# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: #默认规则# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"
创建脚本systemd管理
vim /usr/lib/systemd/system/prometheus.service[Unit]
Description=Prometheus
After=network.target[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/data \
--storage.tsdb.retention.time=15d \
--web.console.templates=/opt/prometheus/consoles \
--web.console.libraries=/opt/prometheus/console_libraries \
--web.max-connections=512 \
--web.external-url "http://自己服务器IP地址:9090" \
--web.listen-address "0.0.0.0:9090" \
--web.enable-admin-api \
--web.enable-lifecycle
Restart=on-failure[Install]
WantedBy=multi-user.target
启动验证
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus
# 查看服务端口
ss -tunlp | grep 9090
访问 http://自己服务器IP:9090
点击Endpoint目标的值,再从exporter具体能抓到的数据,随便复制一个值就好,比如go_gc_pauses_seconds_count
部署node_exporter
Node_exporter收集机器的系统数据,采用prometheus官方提供的exporte
安装
# 安装node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -zxvf node_exporter-1.6.1.linux-amd64.tar.gz -C /opt/
cd /opt/
mv node_exporter-1.6.1.linux-amd64/ node_exporter# 添加用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /opt/node_exporter# 设置开机启动
vim /lib/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node_exporter
After=network.target prometheus.service[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/node_exporter/node_exporter --web.listen-address=0.0.0.0:9101
Restart=on-failure[Install]
WantedBy=multi-user.target
设置启动服务
systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
添加 node_exporter到配置文件
cat >> prometheus.yml <<EOF- job_name: 'node'static_configs:- targets: ['IP地址:9101']
EOF
注意:这里添加的一定要在 /opt/prometheus/prometheus.yml文件中操作,否则会导致后续prometheus中没有node节点,grafana表盘中无数据
重启prometheus服务
systemctl restart prometheus.service
验证
查看监控指标 http://IP地址:9101/metrics
这里跟部署prometheus启动验证一样,随机拿一个数据验证。
查看target
部署grafana
安装配置
# 安装
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.2.0-1.x86_64.rpm
yum -y install grafana-enterprise-10.2.0-1.x86_64.rpm# 设置开启自启
systemctl enable grafana-server
systemctl start grafana-server
登录访问
访问:http://IP地址:3000,默认账号/密码:admin/admin,首次登陆需要修改默认的管理员密码
添加数据
这里URL填http:localhost:9090/或者http:IP地址:9090/都行
然后点击保存
导入仪表盘
new->import
这里1是填官方提供的表盘形式,填写11074或者16098都行
这里名字随便填,2默认,3点击后有个prometheus标志出来点击就行。然后import
可能遇到的问题
xshell无法传文件给远程主机
原因:远程服务器中没有安装响应相对应的驱动
解决办法:服务器安转就行
yum install lrzsz
下载超时
原因:服务器访问github比较慢或者访问不上
解决方法:修改网络文件或者直接下载相应的文件然后用xshell等远程工具上传到服务器
grafana仪表盘无数据
检查部署node_exporter步骤时,验证的时候是否会有node节点出现。没有出现则在配置node_exporter文件步骤中,prometheus.yml文件是否配置正确,注意scrape_configs:节点下
- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["IP地址:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"- job_name: 'node'static_configs:- targets: ['IP地址:9101']
在grafana中dashboards中url是否配置正确
无法打开prometheus和grafana网站
用的阿里云服务器做的实验的话,查看安全组是否放行了9090,9101等这些端口。
参考文章
prometheus+grafana部署