当前位置: 首页 > news >正文

Kubernetes部署Prometheus+Grafana 监控系统NFS存储方案

文章目录

  • Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)
    • 一、NFS服务器配置(192.168.104.50)
      • 1. 安装NFS服务
      • 2. 创建共享目录
      • 3. 配置NFS共享
      • 4. 验证NFS共享
    • 二、所有Kubernetes节点配置(包括master和worker)
      • 1. 安装NFS客户端
      • 2. 验证NFS连接
    • 三、准备容器镜像(在可以访问外网的机器上操作)
      • 1. 拉取所需镜像
      • 2. 保存镜像为文件
      • 3. 将镜像文件传输到所有节点
      • 4. 在每个节点加载镜像
    • 四、部署NFS Provisioner
      • 1. 创建命名空间
      • 2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)
      • 3. 创建StorageClass(nfs-storageclass.yaml)
      • 4. 应用配置
      • 5. 验证部署
    • 五、部署Prometheus监控系统
      • 1. 创建监控命名空间
      • 2. 创建Prometheus配置文件(prometheus-configmap.yaml)
      • 3. 创建Prometheus主部署文件(prometheus.yaml)
      • 4. 应用Prometheus配置
    • 六、部署Node Exporter
      • 1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)
      • 2. 创建Node Exporter服务文件(node-exportet-svc.yaml)
      • 3. 应用Node Exporter配置
    • 七、部署Grafana
      • 1. 创建Grafana部署文件(grafana.yaml)
      • 2. 应用Grafana配置
    • 八、验证部署
      • 1. 检查所有组件状态
      • 2. 访问监控界面
      • 3. 配置Grafana
    • 九、常见问题解决指南
      • 问题1:PVC处于Pending状态
      • 问题2:镜像拉取失败
      • 问题3:NFS连接问题

Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)

一、NFS服务器配置(192.168.104.50)

1. 安装NFS服务

yum install -y nfs-utils rpcbind
systemctl enable --now rpcbind
systemctl enable --now nfs-server

2. 创建共享目录

mkdir -p /data/k8s_data
chmod 777 /data/k8s_data

3. 配置NFS共享

echo "/data/k8s_data *(rw,sync,no_root_squash,no_subtree_check)" > /etc/exports
exportfs -arv

4. 验证NFS共享

showmount -e localhost
# 应该输出:
# Export list for localhost:
# /data/k8s_data *

二、所有Kubernetes节点配置(包括master和worker)

1. 安装NFS客户端

yum install -y nfs-utils

2. 验证NFS连接

mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test

三、准备容器镜像(在可以访问外网的机器上操作)

1. 拉取所需镜像

docker pull prom/prometheus:latest
docker pull grafana/grafana:latest
docker pull prom/node-exporter:latest
docker pull registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker pull busybox:latest
docker pull quay.io/coreos/kube-state-metrics:v2.9.2

2. 保存镜像为文件

docker save -o prometheus.tar prom/prometheus:latest
docker save -o grafana.tar grafana/grafana:latest
docker save -o node-exporter.tar prom/node-exporter:latest
docker save -o nfs-provisioner.tar registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker save -o busybox.tar busybox:latest
docker save -o kube-state-metrics.tar quay.io/coreos/kube-state-metrics:v2.9.2

3. 将镜像文件传输到所有节点

scp *.tar root@192.168.104.51:/root
scp *.tar root@192.168.104.52:/root
scp *.tar root@192.168.104.53:/root

4. 在每个节点加载镜像

docker load -i prometheus.tar
docker load -i grafana.tar
docker load -i node-exporter.tar
docker load -i nfs-provisioner.tar
docker load -i busybox.tar
docker load -i kube-state-metrics.tar

四、部署NFS Provisioner

1. 创建命名空间

kubectl create namespace nfs-storageclass

2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:name: nfs-client-provisionernamespace: nfs-storageclass
spec:replicas: 1selector:matchLabels:app: nfs-client-provisionerstrategy:type: Recreatetemplate:metadata:labels:app: nfs-client-provisionerspec:serviceAccountName: nfs-client-provisionernodeSelector:kubernetes.io/hostname: node1  # 指定运行在node1节点containers:- name: nfs-client-provisionerimage: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2imagePullPolicy: IfNotPresentvolumeMounts:- name: nfs-client-rootmountPath: /persistentvolumesenv:- name: PROVISIONER_NAMEvalue: k8s-sigs.io/nfs-subdir-external-provisioner- name: NFS_SERVERvalue: 192.168.104.50- name: NFS_PATHvalue: /data/k8s_datavolumes:- name: nfs-client-rootnfs:server: 192.168.104.50path: /data/k8s_data
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: nfs-client-provisioner-runner
rules:- apiGroups: [""]resources: ["persistentvolumes"]verbs: ["get", "list", "watch", "create", "delete"]- apiGroups: [""]resources: ["persistentvolumeclaims"]verbs: ["get", "list", "watch", "update"]- apiGroups: ["storage.k8s.io"]resources: ["storageclasses"]verbs: ["get", "list", "watch"]- apiGroups: [""]resources: ["events"]verbs: ["create", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: run-nfs-client-provisioner
subjects:- kind: ServiceAccountname: nfs-client-provisionernamespace: nfs-storageclass
roleRef:kind: ClusterRolename: nfs-client-provisioner-runnerapiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:name: nfs-client-provisionernamespace: nfs-storageclass

3. 创建StorageClass(nfs-storageclass.yaml)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:name: nfs-clientannotations:storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:archiveOnDelete: "false"

4. 应用配置

kubectl apply -f nfs-provisioner.yaml
kubectl apply -f nfs-storageclass.yaml

5. 验证部署

kubectl get pods -n nfs-storageclass
# 应该看到nfs-client-provisioner运行中kubectl get storageclass
# 应该看到nfs-client标记为(default)

五、部署Prometheus监控系统

1. 创建监控命名空间

kubectl create namespace monitor

2. 创建Prometheus配置文件(prometheus-configmap.yaml)

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitor
data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:- job_name: 'prometheus'kubernetes_sd_configs:- role: endpointsnamespaces:names: [monitor]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: prometheus-svcaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: webaction: keep- job_name: 'coredns'kubernetes_sd_configs:- role: endpointsnamespaces:names: [kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: metricsaction: keep- job_name: 'kube-apiserver'scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: falsebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: endpointsnamespaces:names: [default, kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kubernetesaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: httpsaction: keep- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__action: replace- job_name: 'cadvisor'kubernetes_sd_configs:- role: nodescheme: httpstls_config:insecure_skip_verify: trueca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'relabel_configs:- target_label: __metrics_path__replacement: /metrics/cadvisor

3. 创建Prometheus主部署文件(prometheus.yaml)

apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:- apiGroups: [""]resources: ["nodes", "services", "endpoints", "pods", "nodes/proxy"]verbs: ["get", "list", "watch"]- apiGroups: ["extensions"]resources: ["ingresses"]verbs: ["get", "list", "watch"]- apiGroups: [""]resources: ["configmaps", "nodes/metrics"]verbs: ["get"]- nonResourceURLs: ["/metrics"]verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:- kind: ServiceAccountname: prometheusnamespace: monitor
---
apiVersion: v1
kind: Service
metadata:name: prometheus-svcnamespace: monitorlabels:app: prometheusannotations:prometheus_io_scrape: "true"
spec:selector:app: prometheustype: NodePortports:- name: webnodePort: 32224port: 9090targetPort: http
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-pvcnamespace: monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 2GistorageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitorlabels:app: prometheus
spec:selector:matchLabels:app: prometheusreplicas: 1template:metadata:labels:app: prometheusspec:serviceAccountName: prometheusinitContainers:- name: "change-permission-of-directory"image: busybox:latestcommand: ["/bin/sh", "-c"]args:- chown -R 65534:65534 /prometheussecurityContext:privileged: truevolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: datacontainers:- name: prometheusimage: prom/prometheus:latestimagePullPolicy: IfNotPresentargs:- "--config.file=/etc/prometheus/prometheus.yml"- "--storage.tsdb.path=/prometheus"- "--web.enable-lifecycle"- "--web.console.libraries=/usr/share/prometheus/console_libraries"- "--web.console.templates=/usr/share/prometheus/consoles"ports:- name: httpcontainerPort: 9090volumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: dataresources:requests:cpu: 100mmemory: 512Milimits:cpu: 100mmemory: 512Mivolumes:- name: datapersistentVolumeClaim:claimName: prometheus-pvc- name: config-volumeconfigMap:name: prometheus-config

4. 应用Prometheus配置

kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus.yaml

六、部署Node Exporter

1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)

apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:hostPID: truehostIPC: truehostNetwork: truenodeSelector:kubernetes.io/os: linuxcontainers:- name: node-exporterimage: prom/node-exporter:latestimagePullPolicy: IfNotPresentargs:- --web.listen-address=$(HOSTIP):9100- --path.procfs=/host/proc- --path.sysfs=/host/sys- --path.rootfs=/host/root- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$env:- name: HOSTIPvalueFrom:fieldRef:fieldPath: status.hostIPports:- containerPort: 9100resources:requests:cpu: 150mmemory: 180Milimits:cpu: 150mmemory: 180MisecurityContext:runAsNonRoot: truerunAsUser: 65534volumeMounts:- name: procmountPath: /host/proc- name: sysmountPath: /host/sys- name: rootmountPath: /host/rootmountPropagation: HostToContainerreadOnly: truetolerations:- operator: "Exists"volumes:- name: prochostPath:path: /proc- name: devhostPath:path: /dev- name: syshostPath:path: /sys- name: roothostPath:path: /

2. 创建Node Exporter服务文件(node-exportet-svc.yaml)

apiVersion: v1
kind: Service
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:app: node-exporterports:- name: metricsport: 9100targetPort: 9100clusterIP: None

3. 应用Node Exporter配置

kubectl apply -f node-exportet-daemonset.yaml
kubectl apply -f node-exportet-svc.yaml

七、部署Grafana

1. 创建Grafana部署文件(grafana.yaml)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-pvcnamespace: monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 2GistorageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-servernamespace: monitor
spec:replicas: 1selector:matchLabels:task: monitoringk8s-app: grafanatemplate:metadata:labels:task: monitoringk8s-app: grafanaspec:containers:- name: grafanaimage: grafana/grafana:latestimagePullPolicy: IfNotPresentports:- containerPort: 3000protocol: TCPvolumeMounts:- mountPath: /var/lib/grafana/name: grafana-dataenv:- name: GF_SERVER_HTTP_PORTvalue: "3000"- name: GF_AUTH_BASIC_ENABLEDvalue: "false"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ORG_ROLEvalue: Admin- name: GF_SERVER_ROOT_URLvalue: /volumes:- name: grafana-datapersistentVolumeClaim:claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:labels:kubernetes.io/cluster-service: 'true'kubernetes.io/name: monitoring-grafananame: grafana-svcnamespace: monitor
spec:ports:- port: 80targetPort: 3000nodePort: 31091selector:k8s-app: grafanatype: NodePort

2. 应用Grafana配置

kubectl apply -f grafana.yaml

八、验证部署

1. 检查所有组件状态

kubectl get pods -n monitor
# 应该看到类似以下输出:
# NAME                                  READY   STATUS    RESTARTS   AGE
# grafana-server-7868b7cc7c-k8lrd       1/1     Running   0          5m
# kube-state-metrics-74c47f9485-h8787   1/1     Running   0          6m
# node-exporter-6h79q                   1/1     Running   0          5m
# node-exporter-9qkbs                   1/1     Running   0          5m
# node-exporter-t64xb                   1/1     Running   0          5m
# prometheus-5696fb478b-4wf4j           1/1     Running   0          6mkubectl get pvc -n monitor
# 应该看到所有PVC状态为Boundkubectl get svc -n monitor
# 应该看到Prometheus和Grafana服务

2. 访问监控界面

  • Prometheus: http://<任意节点IP>:32224
    在这里插入图片描述

  • Grafana: http://<任意节点IP>:31091

3. 配置Grafana

  1. 登录Grafana(默认用户/密码:admin/admin)
  2. 添加数据源:
    • 类型:Prometheus
    • URL:http://prometheus-svc.monitor.svc.cluster.local:9090
  3. 导入仪表盘:
    • Node监控:ID 16098
    • Kubernetes集群监控:ID 14249

在这里插入图片描述

九、常见问题解决指南

问题1:PVC处于Pending状态

原因:NFS Provisioner未能自动创建PV
​解决方案​​:手动创建PV

# 在NFS服务器创建目录
ssh 192.168.104.50
mkdir -p /data/k8s_data/{prometheus,grafana}
chmod 777 /data/k8s_data/{prometheus,grafana}
exit# 创建手动PV
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:name: prometheus-pv
spec:capacity:storage: 2GivolumeMode: FilesystemaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: RetainstorageClassName: nfs-clientnfs:path: /data/k8s_data/prometheusserver: 192.168.104.50
---
apiVersion: v1
kind: PersistentVolume
metadata:name: grafana-pv
spec:capacity:storage: 2GivolumeMode: FilesystemaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: RetainstorageClassName: nfs-clientnfs:path: /data/k8s_data/grafanaserver: 192.168.104.50
EOF# 删除旧Pod强制重建
kubectl delete pod -n monitor --all

问题2:镜像拉取失败

解决方案

  1. 使用本地镜像加载
  2. 在部署文件中指定imagePullPolicy: IfNotPresent
  3. 确保节点上已加载所需镜像

问题3:NFS连接问题

验证步骤

# 在K8s节点上测试
mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test

修复方法

  1. 确保NFS服务器防火墙开放2049端口
  2. 检查NFS服务器/etc/exports配置
  3. 在NFS服务器运行:exportfs -arv
http://www.dtcms.com/a/348590.html

相关文章:

  • Socket some functions
  • 让机器人“想象”未来?VLN导航迎来“理解力”新升级
  • 每日算法刷题Day64:8.24:leetcode 堆6道题,用时2h30min
  • 解密 Spring Boot 自动配置:原理、流程与核心组件协同
  • 人形机器人——电子皮肤技术路线:压电式电子皮肤及一种超越现有电子皮肤NeuroDerm的设计
  • 深度学习:CUDA、PyTorch下载安装
  • Leetcode 3659. Partition Array Into K-Distinct Groups
  • sqlite创建数据库,创建表,插入数据,查询数据的C++ demo
  • 商密保护迷思:经营秘密到底需不需要鉴定?
  • 对称二叉树
  • 机械学习综合练习项目
  • jar包项目自启动设置ubuntu
  • [论文阅读] 软件工程 | GPS算法:用“路径摘要”当向导,软件模型检测从此告别“瞎找bug”
  • 服务器硬件电路设计之 SPI 问答(四):3 线 SPI、Dual SPI 与 Qual SPI 的奥秘
  • 春秋云镜 Hospital
  • Vue 3多语言应用开发实战:vue-i18n深度解析与最佳实践
  • 线程包括哪些状态?线程状态之间是如何变化的?
  • yggjs_rlayout框架v0.1.2使用教程 02 TechLayout 布局组件
  • 拿AI下围棋 -- 开源项目leela-zero
  • ​Mac用户安装JDK 22完整流程(Intel版dmg文件安装指南附安装包下载)​
  • mysql历史社区版本下载
  • 面试题及解答:掌握Linux下常用性能分析工具
  • (Redis)过期删除策略
  • 半年网络安全转型学习计划表(每天3小时)
  • Highcharts推出OEM许可证中国区正式上线:赋能企业级嵌入式数据可视化解决方案
  • 如何使用 DeepSeek 助力工作​。​
  • 数据可视化——matplotlib库
  • EPWpy教程:一个脚本完成能带、声子、电声耦合、弛豫时间计算
  • [自用笔记]上传本地项目至github
  • 联想win11笔记本音频失效,显示差号(x)