K8S云原生监控方案Prometheus+grafana
目录
1. 概述
1.1 系统架构
1.1.1 架构图
编辑
1.2 环境准备
2. 部署prometheus
2.1 创建Namespace
2.2 创建ConfigMap资源
2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim
3. 部署Node_exporter组件
3.1 创建Daemonsets资源
4. 部署Kube_state_metrics组件
4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service
5. 部署Grafana可视化平台
5.1 创建PersistentVolumeClaim,Deployment,Service
6. 部署命令
7. 访问服务
8. grafana仪表盘展示
8.1 为grafana配置数据源
8.2 导入仪表盘
8.3 仪表盘展示
1. 概述
Prometheus是一个开源的监控和告警系统,特别适合云原生环境。本文将详细介绍如何在Kubernetes集群中部署一个完整的Prometheus监控系统,包括Prometheus Server、Node Exporter、Kube-state-metrics和Grafana等组件。
1.1 系统架构
Prometheus监控系统包含以下组件:
-
Prometheus Server: 核心监控服务器,负责数据采集和存储
-
Node Exporter: 节点级指标收集器
-
Kube-state-metrics: Kubernetes集群状态指标收集器
-
Grafana: 数据可视化和仪表板
1.1.1 架构图
1.2 环境准备
IP | 主机名 | 备注 |
---|---|---|
192.168.48.11 | master1 | master节点,k8s1.32.7 |
192.168.48.12 | master2 | master节点,k8s1.32.7 |
192.168.48.13 | master3 | master节点,k8s1.32.7 |
192.168.48.14 | node01 | node节点,k8s1.32.7 |
192.168.48.15 | node02 | noder节点,k8s1.32.7 |
192.168.48.16 | node03 | node节点,k8s1.32.7 |
192.168.48.19 | database | harbor仓库,nfs服务器 |
本次使用k8s高可用集群,且部署均采用国内镜像,即使没有harbor仓库也能正常部署,如果镜像拉取超时,请在评论区留言,博主一定及时补。nfs服务器一定要有,如果其他存储方案如ceph,hostpath等自行更改yaml文件配置。
k8s搭建nfs共享存储参考往期博客:
k8s搭建nfs共享存储
k8s集群搭建参考往期博客:
openeuler24.03部署k8s1.32.7集群(一主两从)
k8s高可用集群搭建参考往期博客:
openeuler24.03部署k8s1.32.7高可用集群(三主三从)
2. 部署prometheus
2.1 创建Namespace
vim prometheus-namespace.yaml
apiVersion: v1 kind: Namespace metadata:name: monitorlabels:name: monitorpurpose: monitoring
2.2 创建ConfigMap资源
vim prometheus-configmap.yaml
apiVersion: v1 kind: ConfigMap metadata:name: prometheus-confignamespace: monitor data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:# 采集 Prometheus 自身- job_name: 'prometheus'kubernetes_sd_configs:- role: endpointsnamespaces:names: [monitor]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: prometheus-svcaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: webaction: keep # 采集 CoreDNS- job_name: 'coredns'kubernetes_sd_configs:- role: endpointsnamespaces:names: [kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: metricsaction: keep # 采集 kube-apiserver- job_name: 'kube-apiserver'scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: falsebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: endpointsnamespaces:names: [default, kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kubernetesaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: httpsaction: keep # 采集 node-exporter- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__action: replace # 采集 cadvisor- job_name: 'cadvisor'kubernetes_sd_configs:- role: nodescheme: httpstls_config:insecure_skip_verify: trueca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'relabel_configs:- target_label: __metrics_path__replacement: /metrics/cadvisor
2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim
vim prometheus.yaml
#创建SA apiVersion: v1 kind: ServiceAccount metadata:name: prometheusnamespace: monitor--- #创建clusterrole apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:name: prometheus rules: - apiGroups:- ""resources:- nodes- services- endpoints- pods- nodes/proxy- nodes/proxyverbs:- get- list- watch - apiGroups:- "extenstions"resources:- ingressesverbs:- get- list- watch - apiGroups:- ""resources:- configmaps- nodes/metricsverbs:- get - nonResourceURLs:- /metricsverbs:- get--- #创建clusterrolebinding apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:name: prometheus roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus subjects: - kind: ServiceAccountname: prometheusnamespace: monitor--- #创建svc apiVersion: v1 kind: Service metadata:name: prometheus-svcnamespace: monitorlabels:app: prometheusannotations:prometheus_io_scrape: "true" # 注解,有这个才可以被Prometheus发现 spec:selector:app: prometheustype: NodePortports:- name: webnodePort: 32224port: 9090targetPort: http--- #创建ingress apiVersion: networking.k8s.io/v1 kind: Ingress metadata:name: prometheus-ingressnamespace: monitor spec:ingressClassName: nginxrules:- host: www.myprometheus.comhttp:paths:- path: /pathType: Prefixbackend:service:name: prometheus-svcport:number: 9090 --- apiVersion: v1 kind: PersistentVolumeClaim metadata:name: prometheus-pvc # PVC 名称namespace: monitor spec:accessModes:- ReadWriteOnce # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany)resources:requests:storage: 2Gi # 请求的存储容量storageClassName: nfs-client # 指定 StorageClass(根据集群环境调整) --- #创建deployment apiVersion: apps/v1 kind: Deployment metadata:name: prometheusnamespace: monitorlabels:app: prometheus spec:selector:matchLabels:app: prometheusreplicas: 1template:metadata:labels:app: prometheusspec:serviceAccountName: prometheusinitContainers:- name: "change-permission-of-directory"image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/busybox:latestcommand: ["/bin/sh"]args: ["-c","chown -R 65534:65534 /prometheus"]securityContext:privileged: truevolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: datacontainers:- image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/prom/prometheus:latestname: prometheusargs:- "--config.file=/etc/prometheus/prometheus.yml"#指定prometheus配置文件路径- "--storage.tsdb.path=/prometheus"#指定tsdb数据库存储路径- "--web.enable-lifecycle"#允许热更新,curl localhost:9090/-/reload 进行热更新- "--web.console.libraries=/usr/share/prometheus/console_libraries"- "--web.console.templates=/usr/share/prometheus/consoles"ports:- containerPort: 9090name: httpvolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: dataresources:requests:cpu: 100mmemory: 512Milimits:cpu: 100mmemory: 512Mivolumes:- name: datapersistentVolumeClaim:claimName: prometheus-pvc- configMap:name: prometheus-configname: config-volume
3. 部署Node_exporter组件
3.1 创建Daemonsets资源
vim node-exportet-daemonset.yaml
apiVersion: apps/v1 kind: DaemonSet metadata:name: node-exporternamespace: monitorlabels:app: node-exporter spec:selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:hostPID: truehostIPC: truehostNetwork: truenodeSelector:kubernetes.io/os: linuxcontainers:- name: node-exporterimage: docker.io/prom/node-exporter:latestargs:- --web.listen-address=$(HOSTIP):9100- --path.procfs=/host/proc- --path.sysfs=/host/sys- --path.rootfs=/host/root- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ports:- containerPort: 9100env:- name: HOSTIPvalueFrom:fieldRef:fieldPath: status.hostIPresources:requests:cpu: 150mmemory: 180Milimits:cpu: 150mmemory: 180MisecurityContext:runAsNonRoot: truerunAsUser: 65534volumeMounts:- name: procmountPath: /host/proc- name: sysmountPath: /host/sys- name: rootmountPath: /host/rootmountPropagation: HostToContainerreadOnly: truetolerations:- operator: "Exists"volumes:- name: prochostPath:path: /proc- name: devhostPath:path: /dev- name: syshostPath:path: /sys- name: roothostPath:path: /
创建Service
vim node-exportet-svc.yaml
apiVersion: v1 kind: Service metadata:name: node-exporternamespace: monitorlabels:app: node-exporter spec:selector:app: node-exporterports:- name: metricsport: 9100targetPort: 9100clusterIP: None # Headless Service(直接通过 Pod IP 访问)
4. 部署Kube_state_metrics组件
4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service
kube-state-metrics.yaml
--- apiVersion: v1 kind: ServiceAccount metadata:name: kube-state-metricsnamespace: monitor --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:name: kube-state-metrics rules: - apiGroups: [""]resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]verbs: ["list", "watch"] - apiGroups: ["extensions"]resources: ["daemonsets", "deployments", "replicasets"]verbs: ["list", "watch"] - apiGroups: ["apps"]resources: ["statefulsets"]verbs: ["list", "watch"] - apiGroups: ["batch"]resources: ["cronjobs", "jobs"]verbs: ["list", "watch"] - apiGroups: ["autoscaling"]resources: ["horizontalpodautoscalers"]verbs: ["list", "watch"] - apiGroups: ["networking.k8s.io"]resources: ["ingresses"]verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:name: kube-state-metrics roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metrics subjects: - kind: ServiceAccountname: kube-state-metricsnamespace: monitor --- apiVersion: apps/v1 kind: Deployment metadata:name: kube-state-metricsnamespace: monitor spec:replicas: 1selector:matchLabels:app: kube-state-metricstemplate:metadata:labels:app: kube-state-metricsspec:serviceAccountName: kube-state-metricscontainers:- name: kube-state-metricsimage: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2imagePullPolicy: IfNotPresentports:- containerPort: 8080 --- apiVersion: v1 kind: Service metadata:annotations:prometheus.io/scrape: 'true'name: kube-state-metricsnamespace: monitorlabels:app: kube-state-metrics spec:ports:- name: kube-state-metricsport: 8080protocol: TCPselector:app: kube-state-metrics
5. 部署Grafana可视化平台
5.1 创建PersistentVolumeClaim,Deployment,Service
vim grafana.yaml
--- apiVersion: v1 kind: PersistentVolumeClaim metadata:name: grafana-pvc # PVC 名称namespace: monitor spec:accessModes:- ReadWriteOnce # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany)resources:requests:storage: 2Gi # 请求的存储容量storageClassName: nfs-client # 指定 StorageClass(根据集群环境调整) --- apiVersion: apps/v1 kind: Deployment metadata:name: grafana-servernamespace: monitor spec:replicas: 1selector:matchLabels:task: monitoringk8s-app: grafanatemplate:metadata:labels:task: monitoringk8s-app: grafanaspec:containers:- name: grafanaimage: grafana/grafana:latestimagePullPolicy: IfNotPresentports:- containerPort: 3000protocol: TCPvolumeMounts:- mountPath: /var/lib/grafana/name: grafana-dataenv:- name: INFLUXDB_HOSTvalue: monitoring-influxdb- name: GF_SERVER_HTTP_PORTvalue: "3000"- name: GF_AUTH_BASIC_ENABLEDvalue: "false"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ORG_ROLEvalue: Admin- name: GF_SERVER_ROOT_URLvalue: /volumes:- name: grafana-datapersistentVolumeClaim:claimName: grafana-pvcaffinity: # 调度优化(可选)nodeAffinity:preferredDuringSchedulingIgnoredDuringExecution:- weight: 1preference:matchExpressions:- key: node-role.kubernetes.io/monitoringoperator: Exists --- apiVersion: v1 kind: Service metadata:labels:kubernetes.io/cluster-service: 'true'kubernetes.io/name: monitoring-grafananame: grafana-svcnamespace: monitor spec:ports:- port: 80targetPort: 3000nodePort: 31091selector:k8s-app: grafanatype: NodePort
6. 部署命令
按照以下顺序部署各个组件:
# 1. 创建命名空间 kubectl apply -f prometheus-namespace.yaml # 2. 部署Prometheus配置 kubectl apply -f prometheus-configmap.yaml # 3. 部署Prometheus主服务 kubectl apply -f prometheus.yaml # 4. 部署Kube-state-metrics kubectl apply -f kube-state-metrics.yaml # 5. 部署Node Exporter kubectl apply -f node-exportet-daemonset.yaml kubectl apply -f node-exportet-svc.yaml # 6. 部署Grafana kubectl apply -f grafana.yaml
检查pod状态:
[root@master1 prometheus]# kubectl get pod -n monitor NAME READY STATUS RESTARTS AGE grafana-server-64c9777c7b-drgdd 1/1 Running 0 110m kube-state-metrics-6db447664-6r2wp 1/1 Running 0 110m node-exporter-ccwk8 1/1 Running 0 110m node-exporter-fbq22 1/1 Running 0 110m node-exporter-hbtm6 1/1 Running 0 110m node-exporter-ndbhh 1/1 Running 0 110m node-exporter-sbb4p 1/1 Running 0 110m node-exporter-xd467 1/1 Running 0 110m prometheus-7cd9944dc4-lbjwx 1/1 Running 0 110m
7. 访问服务
部署完成后,可以通过以下方式访问服务:
-
Prometheus:
http://<node-ip>:32224
或http://www.myprometheus.com
(需要配置域名解析) -
Grafana:
http://<node-ip>:31091
前排提示:192.168.48.10是我的k8s集群高可用的vip,如果不是高可用,输入Pod所在的主机IP即可。
访问Prometheus:http://192.168.48.10:32224
访问grafana:http://192.168.48.10:31091/
8. grafana仪表盘展示
8.1 为grafana配置数据源
点击最下方save & test,出现Successfully queried the Prometheus API.则为成功。
8.2 导入仪表盘
仪表盘id:
-
node节点监控:16098
-
k8s集群监控:14249