当前位置: 首页 > news >正文

【云平台监控】Prometheus 监控平台部署与应用

文章目录

  • Prometheus 监控系统
    • 概述
    • TSDB 存储引擎特点
    • 核心特点
    • 生态组件
    • 工作流程
    • 局限性
  • 部署 Prometheus
    • 1. Prometheus Server 部署
    • 2. 部署 Exporters
    • 3. 部署 Grafana
    • 4. 服务发现
  • Kubernetes 集群部署 Prometheus 和 Grafana 全流程指南
    • 1. 环境准备
    • 2. 部署 Node Exporter
      • 功能:采集节点资源指标(CPU、内存、磁盘等)
      • 步骤:
    • 3. 部署 Prometheus
      • 功能:时序数据库 + 告警规则引擎
      • 步骤:
    • 4. 部署 Grafana
      • 功能:可视化监控数据
      • 步骤:
    • 5. 部署 Alertmanager(邮件告警)
      • 功能:接收 Prometheus 告警并发送通知
      • 步骤:
    • 6. 关键配置说明
    • 7. 验证与测试
    • 总结
  • 配置
  • Kubernetes 集群外部署 Prometheus 并基于 API Server 实现服务发现
    • 步骤
      • 创建 RBAC 授权
      • 获取 ServiceAccount 的 Token
      • 配置 Prometheus
      • 验证
    • 关键点说明

Prometheus 监控系统

概述

  • 定义:开源服务监控系统 & 时序数据库,提供通用数据模型、高效采集、存储和查询接口。
  • 核心组件:Prometheus Server
    • 数据采集:定期从静态配置或服务发现的目标中拉取数据(HTTP Pull,默认15秒/次)。
    • 数据存储:内存暂存后持久化到磁盘(默认2小时回刷),支持WAL日志崩溃恢复。
    • 告警处理:根据规则计算指标,触发告警后发送至Alertmanager。
  • 数据源
    • Exporter:部署于被监控主机,暴露HTTP接口供Prometheus拉取指标。
    • Pushgateway:接收短期任务主动推送的数据,供Prometheus拉取。
  • 服务发现:支持静态配置或动态发现(如Kubernetes API集成)。

TSDB 存储引擎特点

  • 适用场景
    • 海量时序数据存储,顺序写入为主。
    • 数据按时间排序,极少更新。
    • 块删除(按时间范围),非随机删除。
    • 大体积数据,顺序读取为主,缓存效率低。

核心特点

  1. 多维数据模型:时间序列由指标名称 + 键值对标签标识。
  2. 内置时序数据库(TSDB):高效存储本地短期数据(默认15天)。
  3. PromQL查询语言:支持复杂多维查询。
  4. 混合采集模式
    • 默认HTTP Pull拉取。
    • 通过Pushgateway支持Push模式。
  5. 动态服务发现:集成Kubernetes等系统,自动管理监控目标。
  6. 可视化集成:与Grafana无缝对接,提供仪表盘展示。

生态组件

组件功能
Prometheus Server核心组件,负责数据采集、存储、告警规则计算及PromQL查询。
Client Library为应用提供内置指标采集SDK(如Go、Java等语言支持)。
Exporters采集第三方系统指标并转换为Prometheus格式。常见Exporter:
- Node Exporter:主机资源监控(CPU/内存/磁盘等)。
- cAdvisor:容器资源监控。
- kube-state-metrics:K8S资源对象状态监控(Pod/Deployment等)。
- Blackbox Exporter:网络探测(HTTP/TCP/ICMP等)。
Service Discovery动态发现监控目标(支持Kubernetes、Consul、DNS等)。
Alertmanager告警路由、去重、静默,支持邮件/钉钉/企业微信等通知渠道。
Pushgateway临时存储短期任务的推送数据,供Prometheus拉取。
Grafana可视化平台,集成Prometheus数据源,提供丰富的监控仪表盘。

工作流程

  1. 数据采集:Prometheus Server 基于服务发现或静态配置获取目标,并通过 exporter 拉取指标数据。
  2. 数据存储:采集到的数据通过 TSDB 存储到本地存储设备中。
  3. 告警生成:根据配置的告警规则,触发告警并发送到 Alertmanager。
  4. 告警通知:Alertmanager 发送告警通知到指定的接收方,如邮件、钉钉或企业微信。
  5. 数据查询:通过 Prometheus 自带的 Web UI 界面和 PromQL 查询语言查询监控数据。
  6. 数据可视化:Grafana 接入 Prometheus 数据源,将监控数据以图形化形式展示。

局限性

  • 数据场景限制:适合指标监控,不适用于日志/事件存储。

  • 长期存储:本地存储设计为短期保留,需借助InfluxDB等外部方案。

  • 集群支持:原生集群机制不完善,可通过Thanos/Cortex实现高可用。

  • 官网:prometheus.io

  • GitHub:github.com/prometheus

  • Node Exporter指标说明:链接


部署 Prometheus

1. Prometheus Server 部署

  1. 安装 Prometheus

    • 上传并解压 Prometheus 安装包:
      cd /opt/
      tar xf prometheus-2.35.0.linux-amd64.tar.gz
      mv prometheus-2.35.0.linux-amd64 /usr/local/prometheus
      
    • 配置文件 prometheus.yml 解析:
      global:
        scrape_interval: 15s      # 数据采集间隔
        evaluation_interval: 15s  # 告警规则评估间隔
        scrape_timeout: 10s       # 数据采集超时时间
      
      alerting:
        alertmanagers:
          - static_configs:
              - targets:
                # - alertmanager:9093
      
      rule_files:
        # - "first_rules.yml"
        # - "second_rules.yml"
      
      scrape_configs:
        - job_name: "prometheus"
          static_configs:
            - targets: ["localhost:9090"]
      
  2. 配置系统启动文件

    • 创建 prometheus.service
      cat > /usr/lib/systemd/system/prometheus.service <<'EOF'
      [Unit]
      Description=Prometheus Server
      Documentation=https://prometheus.io
      After=network.target
      
      [Service]
      Type=simple
      ExecStart=/usr/local/prometheus/prometheus \
      --config.file=/usr/local/prometheus/prometheus.yml \
      --storage.tsdb.path=/usr/local/prometheus/data/ \
      --storage.tsdb.retention.time=15d \
      --web.enable-lifecycle
      
      ExecReload=/bin/kill -HUP $MAINPID
      Restart=on-failure
      
      [Install]
      WantedBy=multi-user.target
      EOF
      
  3. 启动 Prometheus

    systemctl start prometheus
    systemctl enable prometheus
    netstat -natp | grep :9090
    
    • 访问 http://<Prometheus_IP>:9090 查看 Web UI。

2. 部署 Exporters

  1. Node Exporter(系统级监控)

    • 安装 Node Exporter:
      cd /opt/
      tar xf node_exporter-1.3.1.linux-amd64.tar.gz
      mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin
      
    • 配置启动文件:
      cat > /usr/lib/systemd/system/node_exporter.service <<'EOF'
      [Unit]
      Description=node_exporter
      Documentation=https://prometheus.io/
      After=network.target
      
      [Service]
      Type=simple
      ExecStart=/usr/local/bin/node_exporter \
      --collector.ntp \
      --collector.mountstats \
      --collector.systemd \
      --collector.tcpstat
      
      ExecReload=/bin/kill -HUP $MAINPID
      Restart=on-failure
      
      [Install]
      WantedBy=multi-user.target
      EOF
      
    • 启动 Node Exporter:
      systemctl start node_exporter
      systemctl enable node_exporter
      netstat -natp | grep :9100
      
    • 修改 Prometheus 配置文件,添加 Node Exporter 目标:
      - job_name: nodes
        metrics_path: "/metrics"
        static_configs:
          - targets:
              - 192.168.80.30:9100
              - 192.168.80.11:9100
              - 192.168.80.12:9100
            labels:
              service: kubernetes
      
    • 重新加载配置:
      curl -X POST http://192.168.80.30:9090/-/reload
      
  2. MySQL Exporter

    • 安装 MySQL Exporter:
      cd /opt/
      tar xf mysqld_exporter-0.14.0.linux-amd64.tar.gz
      mv mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter /usr/local/bin/
      
    • 配置 MySQL 用户权限:
      GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost' IDENTIFIED BY 'abc123';
      
    • 启动 MySQL Exporter:
      systemctl start mysqld_exporter
      systemctl enable mysqld_exporter
      netstat -natp | grep :9104
      
    • 修改 Prometheus 配置文件,添加 MySQL Exporter 目标:
      - job_name: mysqld
        metrics_path: "/metrics"
        static_configs:
          - targets:
              - 192.168.80.15:9104
            labels:
              service: mysqld
      
  3. Nginx Exporter

    • 安装 Nginx 和 Nginx VTS 模块:
      ./configure --prefix=/usr/local/nginx \
      --add-module=/usr/local/nginx-module-vts
      make && make install
      
    • 配置 Nginx:
      http {
          vhost_traffic_status_zone;
          vhost_traffic_status_filter_by_host on;
          server {
              listen 8080;
              location /status {
                  vhost_traffic_status_display;
                  vhost_traffic_status_display_format html;
              }
          }
      }
      
    • 启动 Nginx Exporter:
      systemctl start nginx-exporter
      systemctl enable nginx-exporter
      netstat -natp | grep :9913
      
    • 修改 Prometheus 配置文件,添加 Nginx Exporter 目标:
      - job_name: nginx
        metrics_path: "/metrics"
        static_configs:
          - targets:
              - 192.168.80.15:9913
            labels:
              service: nginx
      

3. 部署 Grafana

  1. 安装 Grafana

    yum install -y grafana-7.4.0-1.x86_64.rpm
    systemctl start grafana-server
    systemctl enable grafana-server
    netstat -natp | grep :3000
    
    • 访问 http://<Grafana_IP>:3000,默认账号:admin/admin
  2. 配置数据源

    • 添加 Prometheus 数据源,URL 为 http://<Prometheus_IP>:9090
  3. 导入监控面板

    • 从 Grafana Dashboards 下载模板,导入到 Grafana。

4. 服务发现

  1. 基于文件的服务发现

    • 创建目标文件:
      - targets:
          - 192.168.80.30:9100
          - 192.168.80.15:9100
        labels:
          app: node-exporter
          job: node
      
    • 修改 Prometheus 配置文件:
      - job_name: nodes
        file_sd_configs:
          - files:
              - targets/node*.yaml
            refresh_interval: 2m
      
  2. 基于 Consul 的服务发现

    • 启动 Consul:
      consul agent -server -bootstrap -ui -data-dir=/usr/local/consul/data -bind=192.168.80.14 -client=0.0.0.0 -node=consul-server01 &
      
    • 注册服务:
      {
        "services": [
          {
            "id": "node_exporter-node01",
            "name": "node01",
            "address": "192.168.80.30",
            "port": 9100,
            "tags": ["nodes"],
            "checks": [{
              "http": "http://192.168.80.30:9100/metrics",
              "interval": "5s"
            }]
          }
        ]
      }
      
    • 修改 Prometheus 配置文件:
      - job_name: nodes
        consul_sd_configs:
          - server: 192.168.80.30:8500
            tags:
              - nodes
            refresh_interval: 2m
      

Kubernetes 集群部署 Prometheus 和 Grafana 全流程指南

1. 环境准备

  • 集群节点
    • 控制节点/master01:192.168.80.10
    • 工作节点/node01:192.168.80.11
    • 工作节点/node02:192.168.80.12
  • 命名空间monitor-sa(用于监控组件)

2. 部署 Node Exporter

功能:采集节点资源指标(CPU、内存、磁盘等)

步骤:

  1. 创建命名空间

    kubectl create ns monitor-sa
    
  2. 部署 DaemonSet

    # node-export.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: monitor-sa
      labels:
        name: node-exporter
    spec:
      selector:
        matchLabels:
          name: node-exporter
      template:
        metadata:
          labels:
            name: node-exporter
        spec:
          hostPID: true
          hostIPC: true
          hostNetwork: true
          containers:
          - name: node-exporter
            image: prom/node-exporter:v0.16.0
            ports:
            - containerPort: 9100
            args:
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            volumeMounts:
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
          volumes:
          - name: proc
            hostPath:
              path: /proc
          - name: sys
            hostPath:
              path: /sys
    
    kubectl apply -f node-export.yaml
    
  3. 验证

    kubectl get pods -n monitor-sa -o wide
    curl http://<节点IP>:9100/metrics  # 检查指标数据
    

3. 部署 Prometheus

功能:时序数据库 + 告警规则引擎

步骤:

  1. 创建 ServiceAccount 和 RBAC 授权

    kubectl create serviceaccount monitor -n monitor-sa
    kubectl create clusterrolebinding monitor-clusterrolebinding \
      --clusterrole=cluster-admin \
      --serviceaccount=monitor-sa:monitor
    
  2. 创建 Prometheus 配置 ConfigMap

    # prometheus-cfg.yaml
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: prometheus-config
      namespace: monitor-sa
    data:
      prometheus.yml: |
        global:
          scrape_interval: 15s
          evaluation_interval: 15s
        scrape_configs:
        - job_name: 'kubernetes-node'
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'
            target_label: __address__
        - job_name: 'kubernetes-apiserver'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
    kubectl apply -f prometheus-cfg.yaml
    
  3. 部署 Prometheus

    # prometheus-deploy.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus-server
      namespace: monitor-sa
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus
      template:
        metadata:
          labels:
            app: prometheus
        spec:
          serviceAccountName: monitor
          containers:
          - name: prometheus
            image: prom/prometheus:v2.35.0
            args:
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/prometheus
            ports:
            - containerPort: 9090
            volumeMounts:
            - name: config
              mountPath: /etc/prometheus
          volumes:
          - name: config
            configMap:
              name: prometheus-config
    
    kubectl apply -f prometheus-deploy.yaml
    
  4. 创建 Service 暴露端口

    # prometheus-svc.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
      namespace: monitor-sa
    spec:
      type: NodePort
      ports:
      - port: 9090
        nodePort: 31000
      selector:
        app: prometheus
    
    kubectl apply -f prometheus-svc.yaml
    
  5. 访问 Web UI

    http://<NodeIP>:31000
    

4. 部署 Grafana

功能:可视化监控数据

步骤:

  1. 部署 Grafana

    # grafana.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: grafana
      namespace: monitor-sa
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: grafana
      template:
        metadata:
          labels:
            app: grafana
        spec:
          containers:
          - name: grafana
            image: grafana/grafana:9.0.0
            ports:
            - containerPort: 3000
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: grafana
      namespace: monitor-sa
    spec:
      type: NodePort
      ports:
      - port: 3000
        nodePort: 32000
      selector:
        app: grafana
    
    kubectl apply -f grafana.yaml
    
  2. 配置数据源

    • 访问 http://<NodeIP>:32000,默认账号 admin/admin
    • 添加 Prometheus 数据源:http://prometheus.monitor-sa.svc:9090
  3. 导入仪表盘

    • 从 Grafana Dashboards 搜索模板(如 3119315),导入 JSON。

5. 部署 Alertmanager(邮件告警)

功能:接收 Prometheus 告警并发送通知

步骤:

  1. 创建 Alertmanager 配置

    # alertmanager-cm.yaml
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: alertmanager
      namespace: monitor-sa
    data:
      alertmanager.yml: |
        global:
          smtp_smarthost: 'smtp.qq.com:465'
          smtp_from: 'your-email@qq.com'
          smtp_auth_username: 'your-email@qq.com'
          smtp_auth_password: 'your-smtp-token'  # QQ邮箱授权码
        route:
          group_by: [alertname]
          receiver: 'default'
        receivers:
        - name: 'default'
          email_configs:
          - to: 'alert-receiver@example.com'
    
    kubectl apply -f alertmanager-cm.yaml
    
  2. 部署 Alertmanager

    # alertmanager-deploy.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: alertmanager
      namespace: monitor-sa
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: alertmanager
      template:
        metadata:
          labels:
            app: alertmanager
        spec:
          containers:
          - name: alertmanager
            image: prom/alertmanager:v0.24.0
            args:
            - --config.file=/etc/alertmanager/alertmanager.yml
            volumeMounts:
            - name: config
              mountPath: /etc/alertmanager
          volumes:
          - name: config
            configMap:
              name: alertmanager
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: alertmanager
      namespace: monitor-sa
    spec:
      type: NodePort
      ports:
      - port: 9093
        nodePort: 30066
      selector:
        app: alertmanager
    
    kubectl apply -f alertmanager-deploy.yaml
    
  3. 配置 Prometheus 告警规则

    # 在 prometheus-cfg.yaml 中添加告警规则
    rule_files:
      - "alert-rules.yml"
    

6. 关键配置说明

  1. 服务发现

    • Kubernetes 原生支持:通过 kubernetes_sd_configs 自动发现节点、Pod、Service。
    • Relabel 配置:重写标签以适配监控目标。
  2. 告警路由

    • 分组与抑制:通过 group_byrepeat_interval 避免告警洪泛。
    • 邮件模板:可自定义 HTML 模板提升告警可读性。
  3. 高可用建议

    • Prometheus 联邦:跨集群数据聚合。
    • Thanos/Cortex:长期存储与查询优化。

7. 验证与测试

  1. 触发测试告警

    • 手动关闭某个节点上的 node-exporter,观察 Prometheus 的 Targets 状态。
    • 检查 Alertmanager 界面(http://<NodeIP>:30066)是否收到告警。
  2. 邮件接收

    • 确保邮箱配置正确,检查垃圾邮件文件夹。

总结

通过以上步骤,可在 Kubernetes 集群中快速搭建 Prometheus + Grafana + Alertmanager 的全栈监控告警系统。关键点包括:

  • 自动发现:利用 Kubernetes 原生机制动态监控资源。

  • 可视化:通过 Grafana 仪表盘实时展示数据。

  • 告警闭环:结合 Alertmanager 实现邮件通知,确保问题及时响应。


    配置

kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: prom
data:
  prometheus.yml: |
    # A scrape configuration for running Prometheus on a Kubernetes cluster.
    # This uses separate scrape configs for cluster components (i.e. API server, node)
    # and services to allow each to use different authentication configs.
    #
    # Kubernetes labels will be added as Prometheus labels on metrics via the
    # `labelmap` relabeling action.
    #
    # If you are using Kubernetes 1.7.2 or earlier, please take note of the comments
    # for the kubernetes-cadvisor job; you will need to edit or remove this job.
    # Scrape config for API servers.
    #
    # Kubernetes exposes API servers as endpoints to the default/kubernetes
    # service so this uses `endpoints` role and uses relabelling to only keep
    # the endpoints associated with the default/kubernetes service using the
    # default named port `https`. This works for single API server deployments as
    # well as HA API server deployments.
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      # Default to scraping over https. If required, just disable this or change to
      # `http`.
      scheme: https
      # This TLS & bearer token file config is used to connect to the actual scrape
      # endpoints for cluster components. This is separate to discovery auth
      # configuration because discovery & scraping are two separate concerns in
      # Prometheus. The discovery auth config is automatic if Prometheus runs inside
      # the cluster. Otherwise, more config options have to be provided within the
      # <kubernetes_sd_config>.
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # If your node certificates are self-signed or use a different CA to the
        # master CA, then disable certificate verification below. Note that
        # certificate verification is an integral part of a secure infrastructure
        # so this should only be disabled in a controlled environment. You can
        # disable certificate verification by uncommenting the line below.
        #
        # insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      # Keep only the default/kubernetes service endpoints for the https port. This
      # will add targets for each API server which Kubernetes adds an endpoint to
      # the default/kubernetes service.
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    # Scrape config for nodes (kubelet).
    #
    # Rather than connecting directly to the node, the scrape is proxied though the
    # Kubernetes apiserver.  This means it will work if Prometheus is running out of
    # cluster, or can't connect to nodes for some other reason (e.g. because of
    # firewalling).
    - job_name: 'kubernetes-nodes'
      # Default to scraping over https. If required, just disable this or change to
      # `http`.
      scheme: https
      # This TLS & bearer token file config is used to connect to the actual scrape
      # endpoints for cluster components. This is separate to discovery auth
      # configuration because discovery & scraping are two separate concerns in
      # Prometheus. The discovery auth config is automatic if Prometheus runs inside
      # the cluster. Otherwise, more config options have to be provided within the
      # <kubernetes_sd_config>.
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics
    # Scrape config for Kubelet cAdvisor.
    #
    # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
    # (those whose names begin with 'container_') have been removed from the
    # Kubelet metrics endpoint.  This job scrapes the cAdvisor endpoint to
    # retrieve those metrics.
    #
    # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
    # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
    # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
    # the --cadvisor-port=0 Kubelet flag).
    #
    # This job is not necessary and should be removed in Kubernetes 1.6 and
    # earlier versions, or it will cause the metrics to be scraped twice.
    - job_name: 'kubernetes-cadvisor'
      # Default to scraping over https. If required, just disable this or change to
      # `http`.
      scheme: https
      # This TLS & bearer token file config is used to connect to the actual scrape
      # endpoints for cluster components. This is separate to discovery auth
      # configuration because discovery & scraping are two separate concerns in
      # Prometheus. The discovery auth config is automatic if Prometheus runs inside
      # the cluster. Otherwise, more config options have to be provided within the
      # <kubernetes_sd_config>.
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    # Scrape config for service endpoints.
    #
    # The relabeling allows the actual service scrape endpoint to be configured
    # via the following annotations:
    #
    # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
    # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
    # to set this to `https` & most likely set the `tls_config` of the scrape config.
    # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
    # * `prometheus.io/port`: If the metrics are exposed on a different port to the
    # service then set this appropriately.
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    # Example scrape config for pods
    #
    # The relabeling allows the actual pod scrape endpoint to be configured via the
    # following annotations:
    #
    # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
    # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
    # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
    # pod's declared ports (default is a port-free target if none are declared).
    - job_name: 'kubernetes-pods'
      # if you want to use metrics on jobs, set the below field to
      # true to prevent Prometheus from setting the `job` label
      # automatically.
      honor_labels: false
      kubernetes_sd_configs:
      - role: pod
      # skip verification so you can do HTTPS to pods
      tls_config:
        insecure_skip_verify: true
      # make sure your labels are in order
      relabel_configs:
      # these labels tell Prometheus to automatically attach source
      # pod and namespace information to each collected sample, so
      # that they'll be exposed in the custom metrics API automatically.
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      # these labels tell Prometheus to look for
      # prometheus.io/{scrape,path,port} annotations to configure
      # how to scrape
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (.+)
  

Kubernetes 集群外部署 Prometheus 并基于 API Server 实现服务发现

  • 场景:Prometheus 部署在 Kubernetes 集群外部,需要监控集群内的资源。
  • 挑战:集群外 Prometheus 无法直接访问集群内的资源,需要通过 Kubernetes API Server 实现服务发现。

步骤

创建 RBAC 授权

  1. 创建 Namespace

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    
  2. 创建 ServiceAccount

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: outside-prometheus
      namespace: monitoring
    
  3. 创建 ClusterRole

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: outside-prometheus
    rules:
    - apiGroups: [""]
      resources: ["nodes", "services", "endpoints", "pods", "nodes/proxy"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["networking.k8s.io"]
      resources: ["ingresses"]
      verbs: ["get", "list", "watch"]
    - apiGroups: [""]
      resources: ["configmaps", "nodes/metrics"]
      verbs: ["get"]
    - nonResourceURLs: ["/metrics"]
      verbs: ["get"]
    
  4. 创建 ClusterRoleBinding

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: outside-prometheus
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: outside-prometheus
    subjects:
    - kind: ServiceAccount
      name: outside-prometheus
      namespace: monitoring
    
  5. 应用 RBAC 配置

    kubectl apply -f rbac.yaml
    

获取 ServiceAccount 的 Token

  1. 获取 Token

    TOKEN=$(kubectl get secret $(kubectl -n monitoring get secret | awk '/outside-prometheus/{print $1}') -n monitoring -o jsonpath={.data.token} | base64 -d)
    echo $TOKEN
    
  2. 将 Token 保存到 Prometheus 节点

    echo $TOKEN > /usr/local/prometheus/kubernetes-api-token
    
  3. 复制 Kubernetes CA 证书到 Prometheus 节点

    scp /etc/kubernetes/pki/ca.crt <prometheus_node>:/usr/local/prometheus/
    

配置 Prometheus

  1. 修改 Prometheus 配置文件

    scrape_configs:
      # 集群 API Server 自动发现
      - job_name: 'kubenetes-apiserver'
        kubernetes_sd_configs:
        - role: endpoints
          api_server: https://192.168.80.10:6443  # API Server 地址
          tls_config:
            ca_file: /usr/local/prometheus/ca.crt  # Kubernetes CA 证书
          authorization:
            credentials_file: /usr/local/prometheus/kubernetes-api-token  # Token 文件
        scheme: https
        tls_config:
          ca_file: /usr/local/prometheus/ca.crt
        authorization:
          credentials_file: /usr/local/prometheus/kubernetes-api-token
        relabel_configs:
        - source_labels: ["__meta_kubernetes_namespace", "__meta_kubernetes_endpoints_name", "__meta_kubernetes_endpoint_port_name"]
          regex: default;kubernetes;https
          action: keep
    
      # 集群节点自动发现
      - job_name: 'kubernetes-nodes'
        kubernetes_sd_configs:
        - role: node
          api_server: https://192.168.80.10:6443
          tls_config:
            ca_file: /usr/local/prometheus/ca.crt
          authorization:
            credentials_file: /usr/local/prometheus/kubernetes-api-token
        relabel_configs:
        - source_labels: ["__address__"]
          regex: (.*):10250
          action: replace
          target_label: __address__
          replacement: $1:9100
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
    
  2. 重载 Prometheus 配置

    curl -X POST http://localhost:9090/-/reload
    

验证

  1. 访问 Prometheus Web UI

    • 打开 http://<Prometheus_IP>:9090/targets,检查 Targets 状态是否为 UP
  2. 检查服务发现

    • 确保 Prometheus 能够正确发现 Kubernetes 集群的节点、Pod、Service 等资源。

关键点说明

  1. RBAC 授权

    • 为 Prometheus 分配足够的权限,确保其能够访问 Kubernetes API Server 的资源。
  2. Token 和 CA 证书

    • Token 用于认证,CA 证书用于验证 API Server 的 TLS 证书。
  3. Relabel 配置

    • 通过 relabel_configs 重写标签,确保 Prometheus 能够正确抓取目标数据。
  4. 服务发现类型

    • role: endpoints:用于发现 Kubernetes 服务的 Endpoints。
    • role: node:用于发现 Kubernetes 集群的节点。

注意

  • 网络互通:确保 Prometheus 节点能够访问 Kubernetes API Server。
  • 安全性:Token 和 CA 证书需妥善保管,避免泄露。
  • 性能:大规模集群中,Prometheus 的服务发现和抓取性能需优化。

相关文章:

  • 2025年SEO工具有哪些?老品牌SEO工具有哪些
  • uniapp PDF 预览和下载
  • 1.14学习总结
  • BFS 走迷宫
  • 蓝桥杯之并查集
  • 滤波总结 波形处理原理 如何对一个规律的波形进行滤波 显现出真正的波形 如何设计滤波
  • Visionpro 液位高度检测
  • Sentinel 持久化配置
  • LeetCode刷题第6题【Z 字形变换】---解题思路及源码注释
  • 哈希表-四数之和
  • ceph部署-14版本(nautilus)-使用ceph-ansible部署实验记录
  • 常用架构图:业务架构、产品架构、系统架构、数据架构、技术架构、应用架构、功能架构及信息架构
  • java 通过阿里物联网平台推送数据到显示屏
  • 【办公类-90-01】】20250213周计划四类活动的写法(分散运动、户外游戏、个别化(美工室图书吧探索室))
  • Spring Boot 的约定优于配置,你的理解是什么?
  • Spreadjs与GcExcel
  • 如何使用 HPjtune 分析 Java GC 日志并优化 JVM 性能
  • JS的map方法和Map对象
  • 自己搭建可以和deepseek对话的WEB应用
  • Cursor AI开发微信小程序教程
  • 东莞 网站制作/企业管理培训机构
  • 常德市 网站建设/推广普通话
  • 高端网站建设公司有哪些项目/百度竞价推广什么意思
  • 上海高端网站建设公/广州百度seo代理
  • 电子商务网站建设报告分析/在线网站建设平台
  • 页面设计属于什么专业/seo实战教程