当前位置: 首页 > news >正文

K8S云原生监控方案Prometheus+grafana

目录

1. 概述

1.1 系统架构

1.1.1 架构图

​编辑

1.2 环境准备

2. 部署prometheus

2.1 创建Namespace

2.2 创建ConfigMap资源

2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim

3. 部署Node_exporter组件

3.1 创建Daemonsets资源

4. 部署Kube_state_metrics组件

4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service

5. 部署Grafana可视化平台

5.1 创建PersistentVolumeClaim,Deployment,Service

6. 部署命令

7. 访问服务

8. grafana仪表盘展示

8.1 为grafana配置数据源

8.2 导入仪表盘

8.3 仪表盘展示


1. 概述

Prometheus是一个开源的监控和告警系统,特别适合云原生环境。本文将详细介绍如何在Kubernetes集群中部署一个完整的Prometheus监控系统,包括Prometheus Server、Node Exporter、Kube-state-metrics和Grafana等组件。

1.1 系统架构

Prometheus监控系统包含以下组件:

  • Prometheus Server: 核心监控服务器,负责数据采集和存储

  • Node Exporter: 节点级指标收集器

  • Kube-state-metrics: Kubernetes集群状态指标收集器

  • Grafana: 数据可视化和仪表板

1.1.1 架构图

1.2 环境准备

IP主机名备注
192.168.48.11master1master节点,k8s1.32.7
192.168.48.12master2master节点,k8s1.32.7
192.168.48.13master3master节点,k8s1.32.7
192.168.48.14node01node节点,k8s1.32.7
192.168.48.15node02noder节点,k8s1.32.7
192.168.48.16node03node节点,k8s1.32.7
192.168.48.19databaseharbor仓库,nfs服务器

本次使用k8s高可用集群,且部署均采用国内镜像,即使没有harbor仓库也能正常部署,如果镜像拉取超时,请在评论区留言,博主一定及时补。nfs服务器一定要有,如果其他存储方案如ceph,hostpath等自行更改yaml文件配置。

k8s搭建nfs共享存储参考往期博客:

k8s搭建nfs共享存储

k8s集群搭建参考往期博客:

openeuler24.03部署k8s1.32.7集群(一主两从)

k8s高可用集群搭建参考往期博客:

openeuler24.03部署k8s1.32.7高可用集群(三主三从)

2. 部署prometheus

2.1 创建Namespace

vim prometheus-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:name: monitorlabels:name: monitorpurpose: monitoring

2.2 创建ConfigMap资源

vim prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitor
data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:# 采集 Prometheus 自身- job_name: 'prometheus'kubernetes_sd_configs:- role: endpointsnamespaces:names: [monitor]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: prometheus-svcaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: webaction: keep
​# 采集 CoreDNS- job_name: 'coredns'kubernetes_sd_configs:- role: endpointsnamespaces:names: [kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: metricsaction: keep
​# 采集 kube-apiserver- job_name: 'kube-apiserver'scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: falsebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: endpointsnamespaces:names: [default, kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kubernetesaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: httpsaction: keep
​# 采集 node-exporter- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__action: replace
​# 采集 cadvisor- job_name: 'cadvisor'kubernetes_sd_configs:- role: nodescheme: httpstls_config:insecure_skip_verify: trueca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'relabel_configs:- target_label: __metrics_path__replacement: /metrics/cadvisor

2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim

vim prometheus.yaml
#创建SA
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: monitor---
#创建clusterrole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups:- ""resources:- nodes- services- endpoints- pods- nodes/proxy- nodes/proxyverbs:- get- list- watch
- apiGroups:- "extenstions"resources:- ingressesverbs:- get- list- watch
- apiGroups:- ""resources:- configmaps- nodes/metricsverbs:- get
- nonResourceURLs:- /metricsverbs:- get---
#创建clusterrolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: monitor---
#创建svc
apiVersion: v1
kind: Service
metadata:name: prometheus-svcnamespace: monitorlabels:app: prometheusannotations:prometheus_io_scrape: "true"  # 注解,有这个才可以被Prometheus发现
spec:selector:app: prometheustype: NodePortports:- name: webnodePort: 32224port: 9090targetPort: http---
#创建ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: prometheus-ingressnamespace: monitor
spec:ingressClassName: nginxrules:- host: www.myprometheus.comhttp:paths:- path: /pathType: Prefixbackend:service:name:  prometheus-svcport:number: 9090
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-pvc  # PVC 名称namespace: monitor
spec:accessModes:- ReadWriteOnce  # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany)resources:requests:storage: 2Gi  # 请求的存储容量storageClassName: nfs-client  # 指定 StorageClass(根据集群环境调整)
---
#创建deployment
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitorlabels:app: prometheus
spec:selector:matchLabels:app: prometheusreplicas: 1template:metadata:labels:app: prometheusspec:serviceAccountName: prometheusinitContainers:- name: "change-permission-of-directory"image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/busybox:latestcommand: ["/bin/sh"]args: ["-c","chown -R 65534:65534 /prometheus"]securityContext:privileged: truevolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: datacontainers:- image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/prom/prometheus:latestname: prometheusargs:- "--config.file=/etc/prometheus/prometheus.yml"#指定prometheus配置文件路径- "--storage.tsdb.path=/prometheus"#指定tsdb数据库存储路径- "--web.enable-lifecycle"#允许热更新,curl localhost:9090/-/reload 进行热更新- "--web.console.libraries=/usr/share/prometheus/console_libraries"- "--web.console.templates=/usr/share/prometheus/consoles"ports:- containerPort: 9090name: httpvolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: dataresources:requests:cpu: 100mmemory: 512Milimits:cpu: 100mmemory: 512Mivolumes:- name: datapersistentVolumeClaim:claimName: prometheus-pvc- configMap:name: prometheus-configname: config-volume
​

3. 部署Node_exporter组件

3.1 创建Daemonsets资源

vim node-exportet-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:hostPID: truehostIPC: truehostNetwork: truenodeSelector:kubernetes.io/os: linuxcontainers:- name: node-exporterimage: docker.io/prom/node-exporter:latestargs:- --web.listen-address=$(HOSTIP):9100- --path.procfs=/host/proc- --path.sysfs=/host/sys- --path.rootfs=/host/root- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ports:- containerPort: 9100env:- name: HOSTIPvalueFrom:fieldRef:fieldPath: status.hostIPresources:requests:cpu: 150mmemory: 180Milimits:cpu: 150mmemory: 180MisecurityContext:runAsNonRoot: truerunAsUser: 65534volumeMounts:- name: procmountPath: /host/proc- name: sysmountPath: /host/sys- name: rootmountPath: /host/rootmountPropagation: HostToContainerreadOnly: truetolerations:- operator: "Exists"volumes:- name: prochostPath:path: /proc- name: devhostPath:path: /dev- name: syshostPath:path: /sys- name: roothostPath:path: /
​

创建Service

vim node-exportet-svc.yaml
apiVersion: v1
kind: Service
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:app: node-exporterports:- name: metricsport: 9100targetPort: 9100clusterIP: None  # Headless Service(直接通过 Pod IP 访问)

4. 部署Kube_state_metrics组件

4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service

kube-state-metrics.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:name: kube-state-metricsnamespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: kube-state-metrics
rules:
- apiGroups: [""]resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]verbs: ["list", "watch"]
- apiGroups: ["extensions"]resources: ["daemonsets", "deployments", "replicasets"]verbs: ["list", "watch"]
- apiGroups: ["apps"]resources: ["statefulsets"]verbs: ["list", "watch"]
- apiGroups: ["batch"]resources: ["cronjobs", "jobs"]verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]resources: ["horizontalpodautoscalers"]verbs: ["list", "watch"]
- apiGroups: ["networking.k8s.io"]resources: ["ingresses"]verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: kube-state-metrics
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metrics
subjects:
- kind: ServiceAccountname: kube-state-metricsnamespace: monitor
---
apiVersion: apps/v1
kind: Deployment
metadata:name: kube-state-metricsnamespace: monitor
spec:replicas: 1selector:matchLabels:app: kube-state-metricstemplate:metadata:labels:app: kube-state-metricsspec:serviceAccountName: kube-state-metricscontainers:- name: kube-state-metricsimage: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2imagePullPolicy: IfNotPresentports:- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:annotations:prometheus.io/scrape: 'true'name: kube-state-metricsnamespace: monitorlabels:app: kube-state-metrics
spec:ports:- name: kube-state-metricsport: 8080protocol: TCPselector:app: kube-state-metrics

5. 部署Grafana可视化平台

5.1 创建PersistentVolumeClaim,Deployment,Service

vim grafana.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-pvc  # PVC 名称namespace: monitor
spec:accessModes:- ReadWriteOnce  # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany)resources:requests:storage: 2Gi  # 请求的存储容量storageClassName: nfs-client  # 指定 StorageClass(根据集群环境调整)
---
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-servernamespace: monitor
spec:replicas: 1selector:matchLabels:task: monitoringk8s-app: grafanatemplate:metadata:labels:task: monitoringk8s-app: grafanaspec:containers:- name: grafanaimage: grafana/grafana:latestimagePullPolicy: IfNotPresentports:- containerPort: 3000protocol: TCPvolumeMounts:- mountPath: /var/lib/grafana/name: grafana-dataenv:- name: INFLUXDB_HOSTvalue: monitoring-influxdb- name: GF_SERVER_HTTP_PORTvalue: "3000"- name: GF_AUTH_BASIC_ENABLEDvalue: "false"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ORG_ROLEvalue: Admin- name: GF_SERVER_ROOT_URLvalue: /volumes:- name: grafana-datapersistentVolumeClaim:claimName: grafana-pvcaffinity:  # 调度优化(可选)nodeAffinity:preferredDuringSchedulingIgnoredDuringExecution:- weight: 1preference:matchExpressions:- key: node-role.kubernetes.io/monitoringoperator: Exists
---
apiVersion: v1
kind: Service
metadata:labels:kubernetes.io/cluster-service: 'true'kubernetes.io/name: monitoring-grafananame: grafana-svcnamespace: monitor
spec:ports:- port: 80targetPort: 3000nodePort: 31091selector:k8s-app: grafanatype: NodePort

6. 部署命令

按照以下顺序部署各个组件:

# 1. 创建命名空间
kubectl apply -f prometheus-namespace.yaml
​
# 2. 部署Prometheus配置
kubectl apply -f prometheus-configmap.yaml
​
# 3. 部署Prometheus主服务
kubectl apply -f prometheus.yaml
​
# 4. 部署Kube-state-metrics
kubectl apply -f kube-state-metrics.yaml
​
# 5. 部署Node Exporter
kubectl apply -f node-exportet-daemonset.yaml
kubectl apply -f node-exportet-svc.yaml
​
# 6. 部署Grafana
kubectl apply -f grafana.yaml

检查pod状态:

[root@master1 prometheus]# kubectl get pod -n monitor 
NAME                                 READY   STATUS    RESTARTS   AGE
grafana-server-64c9777c7b-drgdd      1/1     Running   0          110m
kube-state-metrics-6db447664-6r2wp   1/1     Running   0          110m
node-exporter-ccwk8                  1/1     Running   0          110m
node-exporter-fbq22                  1/1     Running   0          110m
node-exporter-hbtm6                  1/1     Running   0          110m
node-exporter-ndbhh                  1/1     Running   0          110m
node-exporter-sbb4p                  1/1     Running   0          110m
node-exporter-xd467                  1/1     Running   0          110m
prometheus-7cd9944dc4-lbjwx          1/1     Running   0          110m

7. 访问服务

部署完成后,可以通过以下方式访问服务:

  • Prometheus: http://<node-ip>:32224http://www.myprometheus.com(需要配置域名解析)

  • Grafana: http://<node-ip>:31091

前排提示:192.168.48.10是我的k8s集群高可用的vip,如果不是高可用,输入Pod所在的主机IP即可。

访问Prometheus:http://192.168.48.10:32224

访问grafana:http://192.168.48.10:31091/

8. grafana仪表盘展示

8.1 为grafana配置数据源

点击最下方save & test,出现Successfully queried the Prometheus API.则为成功。

8.2 导入仪表盘

仪表盘id:

  • node节点监控:16098

  • k8s集群监控:14249

8.3 仪表盘展示

http://www.dtcms.com/a/318188.html

相关文章:

  • Java throw exception时需要重点关注的事情!
  • TCP的三次握手和四次挥手实现过程。以及为什么需要三次握手?四次挥手?
  • 使用Cursor创建iOS应用
  • Xcode 26 如何在创建的 App 包中添加特定的目录
  • 北大、蚂蚁三个维度解构高效隐私保护机器学习:前沿进展+发展方向
  • 安装Chocolatey一文通
  • IPS知识点
  • Ubuntu设置
  • 从零开始用 Eclipse 写第一个 Java 程序:HelloWorld 全流程 + 避坑指南
  • Vscode 解决 git插件Failed to connect to github.com port 443 connection timed out
  • prometheus+Grafana 监控中间件项目
  • ROG 掌机 X:便携游戏新宠,开启微观生存冒险
  • 部署Web UI自动化测试平台:SeleniumFlaskTester
  • 魔法客栈 v0.74(Magic Inn)免安装中文版,破木屋逆袭五星城堡
  • Pytorch-07 如何快速把已经有的视觉模型权重扒拉过来为己所用
  • X86-ubuntu22.04远程桌面只有1/4无法正常操作
  • Linux学习-4用户/权限
  • 防火墙(firewalld)
  • 数字农业气象站如何助力农田发展
  • 接口开发API 接入实战解析:京东商品实时数据赋能
  • 《算法导论》第 8 章—线性时间排序
  • NVIDIA Isaac GR00T N1.5 源码剖析与复现
  • TDengine 中 TDgp 常见问题
  • 微信小程序中使用TensorFlowJS从环境搭建到模型训练及推理模型得到预测结果
  • AI产品经理面试宝典第64天:2025年数据决策与用户隐私核心面试题指南
  • 卡车手机远程启动一键启动无钥匙进入有哪些好处
  • 【node.js】windows下如何更换node.js版本
  • STM32 C语言实现16进制与十进制整型互转(含自己编写测试可用的程序)
  • SpringBoot使用Hutool邮件工具MailUtil实现电子邮件发送功能(以网易邮箱为例)
  • 51c自动驾驶~合集13