当前位置: 首页 > news >正文

kubernetes环境手动部署 Prometheus 监控系统安装文档

前言:文中“实操示例”配置内容,可按需要进行拆解安装配置

一、环境准备
  1. Kubernetes 集群
    确保已部署 Kubernetes 集群(版本 ≥1.20),且 kubectl 工具已配置。
  2. 镜像仓库
    确认镜像 harbor.fq.com/prometheus/node-exporter:v1.8.2 和 Prometheus 相关镜像在私有仓库中可用。
  3. 命名空间
    默认使用 default 命名空间,可根据需求调整至 monitoring(需同步修改所有 YAML 文件中的 namespace 字段)。

二、创建 RBAC 权限

目标:为 Prometheus 分配访问 Kubernetes API 的权限。

1. 创建 ServiceAccount
# prometheus-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheus
secrets:
- name: prometheus-token

解释

  • ServiceAccount prometheus 用于 Prometheus 的身份认证。
  • secrets 字段关联一个 Secret(prometheus-token),存储访问凭证。
2. 创建 ClusterRole
# prometheus-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:- apiGroups: [""]resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]verbs: ["get", "list", "watch"]- apiGroups: [""]resources: ["configmaps"]verbs: ["get"]- nonResourceURLs: ["/metrics"]verbs: ["get"]

解释

  • 授予 Prometheus 访问节点、服务、Pod 等资源的权限。
  • 允许读取 /metrics 端点(非资源 URL)。
3. 创建 ClusterRoleBinding
# prometheus-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: default
roleRef:kind: ClusterRolename: prometheusapiGroup: rbac.authorization.k8s.io

解释

  • prometheus ClusterRole 绑定到 prometheus ServiceAccount,确保权限生效。
4. 生成 ServiceAccount Token
# prometheus-token.yaml
apiVersion: v1
kind: Secret
metadata:name: prometheus-tokenannotations:kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token

应用 RBAC 配置

kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrole.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-token.yaml
☆实操示例

cat prometheus-rabc0227.yaml

---
# 1. 创建 monitoring 命名空间
apiVersion: v1
kind: Namespace
metadata:name: monitoring
---
# 2. 创建 Prometheus 使用的 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: monitoring
---
# 3. 创建 ClusterRole,定义 Prometheus 的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups: [""]resources:- nodes- nodes/metrics- services- endpoints- podsverbs: ["get", "list", "watch"]
- apiGroups: [""]resources:- configmapsverbs: ["get"]
- apiGroups: [""]resources:- nodes/proxyverbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]resources:- ingressesverbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]verbs: ["get"]
---
# 4. 将 ClusterRole 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: monitoring
---

三、部署 Node Exporter

目标:在每个节点上部署 Node Exporter,收集节点资源指标。

# node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: kube-system
spec:selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:hostNetwork: truecontainers:- name: node-exporterimage: harbor.fq.com/prometheus/node-exporter:v1.8.2args:- --path.rootfs=/hostvolumeMounts:- name: rootfsmountPath: /hostvolumes:- name: rootfshostPath:path: /

解释

  • DaemonSet 确保每个节点运行一个 Node Exporter Pod。
  • hostNetwork: true 使用节点网络,直接暴露节点指标。
  • hostPath 挂载根文件系统,用于收集节点级数据。

部署命令

kubectl apply -f node-exporter-daemonset.yml
☆实操示例

cat node-exporter-daemonset.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: monitoring  # 使用 "monitoring" 命名空间labels:k8s-app: node-exporter
spec:selector:matchLabels:k8s-app: node-exportertemplate:metadata:labels:k8s-app: node-exporterannotations:prometheus.io/scrape: "true"  # 允许 Prometheus 抓取数据prometheus.io/port: "9100"    # 指定 Node Exporter 端口spec:hostNetwork: true  # 允许 Pod 使用主机网络hostPID: true      # 允许访问主机的 PID 进程tolerations:- effect: NoSchedule  # 允许调度到 tainted 节点operator: Exists- effect: NoExecuteoperator: ExistssecurityContext:runAsNonRoot: true  # 避免使用 root 权限runAsUser: 65534     # 运行时使用 nobody 用户containers:- name: node-exporterimage: harbor.fq.com/prometheus/node-exporter:v1.8.2  # 替换为可信赖的镜像地址args:- --path.rootfs=/host/root   # 设定 rootfs 路径- --path.procfs=/host/proc   # 设定 procfs 路径- --path.sysfs=/host/sys     # 设定 sysfs 路径- --no-collector.wifi        # 禁用 WiFi 采集- --no-collector.hwmon       # 禁用硬件监控采集ports:- containerPort: 9100protocol: TCPresources:  # 资源请求与限制requests:memory: "30Mi"cpu: "100m"limits:memory: "50Mi"cpu: "200m"volumeMounts:  # 挂载主机目录- name: procmountPath: /host/procreadOnly: true- name: sysmountPath: /host/sysreadOnly: true- name: rootfsmountPath: /host/rootreadOnly: truevolumes:- name: prochostPath:path: /proc- name: syshostPath:path: /sys- name: rootfshostPath:path: /
---
apiVersion: v1
kind: Service
metadata:name: node-exporternamespace: monitoringlabels:k8s-app: node-exporterannotations:prometheus.io/scrape: 'true'  # 允许 Prometheus 采集prometheus.io/port: '9100'    # 采集端口
spec:selector:k8s-app: node-exporterports:- name: metricsport: 9100protocol: TCPtargetPort: 9100type: ClusterIP  # 仅在集群内部可访问

四、部署 Prometheus

目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。

1. 创建持久化存储卷(PV/PVC)

根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。
示例(需根据实际环境调整):

# prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-data
spec:accessModes:- ReadWriteOnceresources:requests:storage: 50Gi
2. 创建 Prometheus Deployment
# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheus
spec:replicas: 1selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:serviceAccountName: prometheuscontainers:- name: prometheusimage: prom/prometheus:v2.42.0args:- "--config.file=/etc/prometheus/prometheus.yml"ports:- containerPort: 9090volumeMounts:- name: config-volumemountPath: /etc/prometheus- name: data-volumemountPath: /prometheusvolumes:- name: config-volumeconfigMap:name: prometheus-config- name: data-volumepersistentVolumeClaim:claimName: prometheus-data
☆实操示例
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitoring  # 指定命名空间labels:app: prometheus
spec:replicas: 1  # 生产环境通常建议 1 个实例,使用远程存储提高可用性selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:serviceAccountName: prometheus  # 关联 ServiceAccount,便于 RBAC 访问containers:- name: prometheusimage: harbor.fq.com/prometheus/prometheus:v3.1.0  # 使用私有仓库镜像args:- --config.file=/etc/prometheus/prometheus.yml  # 指定 Prometheus 配置文件- --storage.tsdb.path=/prometheus  # 存储 TSDB 数据的位置- --web.console.templates=/etc/prometheus/consoles- --web.console.libraries=/etc/prometheus/console_librariesports:- containerPort: 9090  # Prometheus Web 界面端口resources:  # 限制 CPU 和内存,防止资源耗尽requests:cpu: "500m"memory: "1Gi"limits:cpu: "1"memory: "2Gi"volumeMounts:- name: prometheus-configmountPath: /etc/prometheus  # 配置文件挂载点- name: prometheus-storagemountPath: /prometheus  # TSDB 数据存储路径- name: file-sdmountPath: /apps/prometheus/file-sd.yaml  # 动态目标发现文件路径subPath: file-sd.yaml  # 仅挂载文件,而不是整个目录volumes:- name: prometheus-configconfigMap:name: prometheus-config  # 从 ConfigMap 挂载 Prometheus 配置- name: prometheus-storage# persistentVolumeClaim:  # 生产环境使用 PVC 持久化存储#  claimName: prometheus-pvcemptyDir: {}  # 测试环境可使用空目录- name: file-sdhostPath:path: /root/file-sd.yaml  # 使用主机上的动态发现文件type: File
---
apiVersion: v1
kind: Service
metadata:name: prometheusnamespace: monitoringlabels:app: prometheus
spec:type: NodePort  # 在生产环境中建议使用 LoadBalancer 或 Ingressports:- port: 9090targetPort: 9090nodePort: 30090  # 通过 NodePort 访问 Web 界面selector:app: prometheus
3. 创建 Prometheus ConfigMap
# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-config
data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15salerting:alertmanagers:- static_configs:- targets: ['alertmanager:9093']rule_files:- '/etc/prometheus/alert_rules.yml'scrape_configs:- job_name: 'prometheus'static_configs:- targets: ['localhost:9090']- job_name: 'node-exporter'static_configs:- targets: ['node-exporter:9100']- job_name: 'cadvisor'static_configs:- targets: ['cadvisor:8080']- job_name: 'pushgateway'static_configs:- targets: ['pushgateway:9091']- job_name: 'node-linux'static_configs:- targets: ['10.255.209.40:9100']- job_name: 'kubernetes-apiservers'kubernetes_sd_configs:- role: endpointskubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfigtls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenscheme: httpsrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https- job_name: 'kubernetes-nodes'kubernetes_sd_configs:- role: nodekubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfigtls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenscheme: httpsrelabel_configs:- source_labels: [_meta_kubernetes_node_ip]regex: '(.*):10250'  # Kubernetes 节点的默认 kubelet 端口replacement: '${1}:9100'  # Node Exporter 的监听端口target_label: __address__action: replace- action: labelmapregex: __meta_kubernetes_node_label_(.+)- job_name: 'kubernetes-pods'kubernetes_sd_configs:- role: podnamespaces:names:- kube-system- defaulttls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenscheme: httpsrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (.+)- source_labels: [__meta_kubernetes_pod_ip]action: replacetarget_label: __address__regex: (.+)replacement: ${1}:9090- job_name: 'kubernetes-service-endpoints'kubernetes_sd_configs:- role: endpointskubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfigtls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenscheme: httpsrelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (https?)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: kubernetes_service_name

应用配置

kubectl apply -f prometheus-pvc.yaml
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
☆实操示例

cat prometheus-configmap0227.yaml

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitoring
data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15sscrape_timeout: 10s  # 添加超时时间,避免抓取任务卡住alerting:alertmanagers:- static_configs:- targets: ['alertmanager:9093']rule_files:- '/etc/prometheus/alert_rules.yml'scrape_configs:# 抓取 Prometheus 自身指标- job_name: 'prometheus'static_configs:- targets: ['localhost:9090']# 抓取 Node Exporter 指标- job_name: 'node-exporter'static_configs:- targets: ['node-exporter:9100']# 抓取 cAdvisor 指标- job_name: 'cadvisor'static_configs:- targets: ['cadvisor:8080']# 抓取 Pushgateway 指标- job_name: 'pushgateway'static_configs:- targets: ['pushgateway:9091']# 抓取特定节点的 Node Exporter 指标- job_name: 'node-linux'static_configs:- targets: ['10.255.209.40:9100']# 抓取 Kubernetes API Server 指标- job_name: 'kubernetes-apiservers'kubernetes_sd_configs:- role: endpointsscheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https# 抓取 Kubernetes 节点指标(通过 Node Exporter)- job_name: 'kubernetes-nodes'kubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'  # 将 kubelet 端口替换为 Node Exporter 端口target_label: __address__- action: labelmapregex: __meta_kubernetes_node_label_(.+)# 抓取 Kubernetes Pods 指标- job_name: 'kubernetes-pods'kubernetes_sd_configs:- role: podrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2- action: labelmapregex: __meta_kubernetes_pod_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: kubernetes_pod_name# 抓取 Kubernetes Service Endpoints 指标- job_name: 'kubernetes-service-endpoints'kubernetes_sd_configs:- role: endpoints#scheme: https#tls_config:#  ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt#  insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书#bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (https?)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: kubernetes_service_name- job_name: 'kubernetes-nginx-endpoints'  # 任务名称kubernetes_sd_configs:- role: endpoints  # 自动发现 Kubernetes Endpointsrelabel_configs:# 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true# 替换抓取协议(http 或 https)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (https?)# 替换指标路径(默认为 /metrics)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)# 替换抓取地址和端口- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2# 将 Kubernetes 标签映射到 Prometheus 标签- action: labelmapregex: __meta_kubernetes_service_label_(.+)# 添加 Kubernetes Namespace 标签- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace# 添加 Kubernetes Service 名称标签- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: kubernetes_service_name# 添加 Kubernetes Pod 名称标签- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: kubernetes_pod_name# 添加 Kubernetes Node 名称标签- source_labels: [__meta_kubernetes_pod_node_name]action: replacetarget_label: kubernetes_node_name# 如果需要抓取 HTTPS 端点,取消注释以下配置# scheme: https# tls_config:#   ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt#   insecure_skip_verify: true  # 生产环境中建议关闭,配置正确的 CA 证书# bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token- job_name: 'kube-state-metrics'kubernetes_sd_configs:- role: endpointsnamespaces:names:- kube-system- monitoring- defaultrelabel_configs:- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]action: keepregex: kube-state-metrics- source_labels: [__meta_kubernetes_endpoint_port_name]action: keepregex: http-metricsmetrics_path: /metricsscheme: http- job_name: "file_sd"file_sd_configs:- files:- /apps/prometheus/file-sd.yamlrefresh_interval: 1m- job_name: 'redis'kubernetes_sd_configs:- role: endpoints  # 从 Kubernetes Endpoints 发现服务relabel_configs:# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true# 替换目标地址为服务的 IP 和指定端口(9121)- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]action: keepregex: Pod;(.*redis.*)  # 仅抓取名称包含 "redis" 的 Pod- source_labels: [__meta_kubernetes_pod_ip]action: replacetarget_label: __address__replacement: $1:9121  # 指定 Redis Exporter 的端口为 9121# 添加 Kubernetes 服务的 app 标签- source_labels: [__meta_kubernetes_service_label_app]action: replacetarget_label: app# 添加 Kubernetes 命名空间标签- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace# 添加 Kubernetes 服务名称标签- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service# 添加 Kubernetes Pod 名称标签- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod# 添加 Kubernetes 节点名称标签- source_labels: [__meta_kubernetes_pod_node_name]action: replacetarget_label: node# 添加实例标签(用于区分不同的 Redis 实例)- source_labels: [__meta_kubernetes_pod_ip]action: replacetarget_label: instance- job_name: 'mysql'kubernetes_sd_configs:- role: endpoints  # 从 Kubernetes Endpoints 发现服务relabel_configs:# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true# 替换目标地址为服务的 IP 和指定端口(9104)- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]action: keepregex: Pod;(.*mysql-exporter.*)  # 仅抓取名称包含 "mysql-exporter" 的 Pod- source_labels: [__meta_kubernetes_pod_ip]action: replacetarget_label: __address__replacement: $1:9104  # 指定 MySQL Exporter 的端口为 9104# 添加 Kubernetes 服务的 app 标签- source_labels: [__meta_kubernetes_service_label_app]action: replacetarget_label: app# 添加 Kubernetes 命名空间标签- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: namespace# 添加 Kubernetes 服务名称标签- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: service# 添加 Kubernetes Pod 名称标签- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: pod# 添加 Kubernetes 节点名称标签- source_labels: [__meta_kubernetes_pod_node_name]action: replacetarget_label: node# 添加实例标签(用于区分不同的 MySQL 实例)- source_labels: [__meta_kubernetes_pod_ip]action: replacetarget_label: instance
4. 暴露 Prometheus 服务
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:name: prometheus
spec:type: NodePortports:- port: 9090targetPort: 9090nodePort: 30090selector:app: prometheus

应用服务

kubectl apply -f prometheus-service.yaml

五、验证部署
  1. 检查 Pod 状态

    kubectl get pods -l app=prometheus -n default
    kubectl get pods -n kube-system -l app=node-exporter

    预期输出:所有 Pod 状态为 Running

  2. 访问 Prometheus UI
    通过浏览器访问 http://<NodeIP>:30090,进入 Prometheus 控制台。

    • Status > Targets 页面,确认 kubernetes-nodeskubernetes-pods 任务状态为 UP
    • 查询 up{job="kubernetes-nodes"} 验证指标抓取是否正常。

六、常见问题排查
  1. 权限问题

    • 错误示例:Failed to list *v1.Pod: forbidden
    • 解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。
  2. Node Exporter 未启动

    • 检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。
  3. Prometheus 无法抓取指标

    • 检查 Prometheus 配置中的 scrape_configs 是否指向正确的端口(如 Node Exporter 默认端口为 9100)。
    • 验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics

七、后续优化
  1. 配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。
  2. 持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。
  3. 监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。

相关文章:

  • 差分优化效率
  • 研发内控新规下的合规之道:维拉工时助力企业穿越IPO审查雷区
  • 深入浅出 MVVM:理解现代前端开发的核心架构模式
  • 贪心算法~~
  • Sand AI 开源 MAGI-1 视频生成模型,近屿智能带你领略无限扩展的 AI 视界
  • 《深入理解计算机系统》阅读笔记之第二章 信息的表示和处理
  • Apipost免费版、企业版和私有化部署详解
  • 保安员证考试的理论知识有哪些重点?
  • 计算机基础 原码反码补码问题
  • 一篇速成Linux 设置位 S(SetUID)
  • kafka课后总结
  • 简述:变更调查的历史情况
  • 定时任务:Quartz
  • Windows同步技术-使用命名对象
  • MySQL引擎分类与选择、SQL更新底层实现、分库分表、读写分离、主从复制 - 面试实战
  • 明远智睿SD2351核心板:以48元撬动AI视觉产业革命的“硬核引擎”
  • 队列基础和例题
  • 保障海外业务发展,U-Mail邮件中继提供高效安全的解决方案
  • [Mybatis-plus]
  • 数据结构------C语言经典题目(6)
  • 武汉楼市新政:二孩、三孩家庭购买新房可分别享受6万元、12万元购房补贴
  • 宁夏民政厅原厅长欧阳艳已任自治区政府副秘书长、办公厅主任
  • 金融创新破局记:中小微企业转型背后的金融力量
  • 国内生产、境外“游一圈”再进保税仓,这些“全球购”保健品竟是假进口
  • 美国政府将暂时恢复部分受影响留学生的合法身份,并将制订新标准
  • 来论|如何看待韩企在美申请“饺子”专利