当前位置: 首页 > news >正文

K8s 指标收集方案对比

文章目录

    • 1. Background
    • 2. Solutions
      • 2.1. MetricBeat
      • 2.2 Telegraf
      • 2.3 MetricServer
      • 2.4 Kubelet cAdvisor
    • 3. Comparison

1. Background

Megacloud Portal needs to add monitoring for K8S. The current demand is

Obtain the CPU/Memory metrics of Node and Pod in K8S, and display TopN after processing.

To achieve this function, the server works as follows:

  • Collects resource metrics of K8S Node and Pod
  • ETL processing and storage of collected data
  • Implement API for front end to accquire data

The key is ** the collection and processing of metric data**. The following is a brief introduction and comparison of related collection schemes.

2. Solutions

2.1. MetricBeat

MetricBeat is a metric collection tool provided by Elasic. It can collect metrics from many open source software including Kubernetes, and can send data to ElasticSearch, Kafka, Redis, and Logstash for processing or storage.

The following is the data format collected from K8S Node and Pod in MetricBeat.

-Node Data Format (information other than CPU/Memory has been omitted)

{"@timestamp": "2017-04-06T15:29:27.150Z","beat": {"hostname": "beathost","name": "beathost","version": "6.0.0-alpha1"},"kubernetes": {"node": {"cpu": {"usage": {"core" : {"ns": 7247863769557035},"nanocores": 1662117892}},"memory": {"available": {"bytes": 134202847232},"majorpagefaults": 1044,"pagefaults": 83482928,"rss": {"bytes": 178053120},"usage": {"bytes": 67062091776},"workingset": {"bytes": 51496206336}},"name": "localhost","start_time": "2017-02-08T10:33:38Z"}},"metricset": {"host": "localhost:10255","module": "kubernetes","name": "node","rtt": 650741},"type": "metricsets"
}{"beat": {"hostname": "X1","name": "X1","version": "6.0.0-alpha1"},"kubernetes": {"node": {"cpu": {"allocatable": {"cores": 2},"capacity": {"cores": 2}},"memory": {"allocatable": {"bytes": 2097786880},"capacity": {"bytes": 2097786880}},"name": "minikube","pod": {"allocatable": {"total": 110},"capacity": {"total": 110}},"status": {"ready": "true","unschedulable": false}}},"metricset": {"host": "192.168.99.100:18080"}
}
  • Pod Data Format
{"@timestamp": "2017-04-06T15:29:27.150Z","beat": {"hostname": "beathost","name": "beathost","version": "6.0.0-alpha1"},"kubernetes": {"namespace": "ns","node": {"name": "localhost",},"pod": {"name": "nginx-3137573019-pcfzh","uid": "b89a812e-18cd-11e9-b333-080027190d51","network": {"rx": {"bytes": 18999261,"errors": 0},"tx": {"bytes": 28580621,"errors": 0}},"start_time": "2017-04-06T12:09:05Z"}},"metricset": {"host": "localhost:10255","module": "kubernetes","name": "pod","rtt": 636230},"type": "metricsets"
}
  • Container Data Format
{"@timestamp": "2017-04-06T15:29:27.150Z","beat": {"hostname": "beathost","name": "beathost","version": "6.0.0-alpha1"},"kubernetes": {"container": {"cpu": {"usage": {"core": {"ns": 3305756719},"nanocores": 5992}},"memory": {"available": {"bytes": 0},"majorpagefaults": 47,"pagefaults": 2298,"rss": {"bytes": 1441792},"usage": {"bytes": 7643136},"workingset": {"bytes": 1466368}},"name": "nginx",},"namespace": "ns","node": {"name": "localhost"},"pod": {"name": "nginx-3137573019-pcfzh",}},"metricset": {"host": "localhost:10255","module": "kubernetes","name": "container","rtt": 650739},"type": "metricsets"
}

MetricBeat only has CPU/Memory indicator data for Node and container. If we use MetricBeat for collection, we need to do the following:

  • Deploy MetricBeat on K8S. We may need to do a lot of manual operations
  • We ne

2.2 Telegraf

Telegraf is an open source software written in Go for metric collection. Like MetricBeat, it provides numerous plugins to collect data from multiple sources.

For Kubernetes, Telegraf provides a Kubernetes plugin to collect data. It gets data through Kubelet’s stats/sumary API. It can also be used with the Prometheus plugin to collect more metric data.

The following are some metric data formats. Like MertricBeat, it does not provide Pod-level CPU/Memory statistics, and needs to be aggregated based on container data.

type NodeMetrics struct {NodeName         string             `json:"nodeName"`SystemContainers []ContainerMetrics `json:"systemContainers"`StartTime        time.Time          `json:"startTime"`CPU              CPUMetrics         `json:"cpu"`Memory           MemoryMetrics      `json:"memory"`Network          NetworkMetrics     `json:"network"`FileSystem       FileSystemMetrics  `json:"fs"`Runtime          RuntimeMetrics     `json:"runtime"`
}// PodMetrics contains metric data on a given pod
type PodMetrics struct {PodRef     PodReference       `json:"podRef"`StartTime  *time.Time         `json:"startTime"`Containers []ContainerMetrics `json:"containers"`Network    NetworkMetrics     `json:"network"`Volumes    []VolumeMetrics    `json:"volume"`
}// ContainerMetrics represents the metric data collect about a container from the kubelet
type ContainerMetrics struct {Name      string            `json:"name"`StartTime time.Time         `json:"startTime"`CPU       CPUMetrics        `json:"cpu"`Memory    MemoryMetrics     `json:"memory"`RootFS    FileSystemMetrics `json:"rootfs"`LogsFS    FileSystemMetrics `json:"logs"`
}

2.3 MetricServer

MetricServer also obtains metric data through the /stats/summary API provided by Kubelet. MetricServer stores the data in memory, and then provides API based on the kube-Aggregator mechanism to provide external access.

The fixed URL prefix of the API provided by MetricSever is
/apis/metrics/v1alpha1/, and then combined with the following APIs for external access to metric data, all APIs only support the GET method:

  • /nodes-Get all Node’s metric data.
  • /nodes/(node)-Get metric data of the specified Node.
  • /namespaces/(namespace)/pods-Get all Pod metrics under a certain namespace.
  • /namespaces/(namespace)/pods/(pod)- Get the metric data of the specified Pod.

In addition, We can view the CPU/Memory metrics of Node and Pod through the kubectl top command on the terminal.

$ kubectl top nodes
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
tk01            217m         10%    5296Mi          68%
vm-0-2-ubuntu   84m          4%     1189Mi          32%$ kubectl top pods --all-namespaces
NAMESPACE     NAME                              CPU(cores)   MEMORY(bytes)
kube-system   coredns-f9fd979d6-jzv8q           4m           10Mi
kube-system   coredns-f9fd979d6-tx9m4           4m           10Mi
kube-system   etcd-tk01                         14m          50Mi
kube-system   kube-apiserver-tk01               31m          293Mi

K8S provides libraries to access the above APIs. Now We has implemented the first version of metric-server-collector, which is based on MetricServer API to obtain the CPU/Memory metrics of Node and Pod and convert them into the data format we need.

2.4 Kubelet cAdvisor

Kubelet integrates cAdvisor to collect statistics on the CPU, memory, file system, and network usage of the container on the node. It provides the /stats/summary API for externally obtaining the metric data, The above-mentioned Telegraf and MetricServer schemes all obtain metric data through this API. Therefore, we can access Kubelet’s API directly to obtain metric data instead of use above tools.

For this solution, the following changes need to be made based on the current metric-server-collector:

  • Rewrite the scraper code, replace the MetricServer API, and access the kubelet API instead to obtain indicator data.
  • Convert the data returned by Kubelet to the required data format.

3. Comparison

Based on the above information, the development, operation and maintenance work of each program is compared as follows

SolutionDevelopment taskDevelopment complexityDeployment operationDeployment complexityOthers
MetricBeatDevelop ETL tools to process data★ ★ ★MetricBeat + ETL tools + data transmission & storage components.★ ★ ★
TelegrafDevelop Telegraf processor or related ETL tools to process data★ ★ ★Deploy Telegraf + ETL tools★ ★ ★
MetricServerNo additional development requiredMetricServer + collectorThe data is stored in memory, which consumes resources when the amount of data is large
Kubelet cAdvisorRe-implement the collector, expected one week★ ★Only need to deploy collector

Based on the above comparison, the preliminary conclusions are as follows:

  • The data format of MetricBeat does not meet the requirements and requires more additional processing, so it is not considered.

  • On the premise of only collecting CPU/Memory metrics for Node and Pod in K8S, Telegraf is a bit of a slasher, and requires additional development and operation and maintenance work. It can be temporarily stopped without collecting more K8S metrics. consider.

  • Compared with the above two schemes, it is very simple to deploy MetricSeverr on K8S, and it can be combined with the metric-server-collector to meet the requirements.

  • The collector implementation based on the Kubelet API can be regarded as an optimization based on the MetricServer implementation, which reduces unnecessary component operation and resource consumption. We can get the most primitive data for conversion on demand.

Reference

  • MetricBeat Kubernetes Module
  • Telegraf Kubernetes Input Plugin
  • Metrics API design
  • Metrics Server design

Appendix

  • Kubelet /stats/summary API response data
{"node": {"nodeName": "tk01", "systemContainers": [{"name": "kubelet", "startTime": "2021-01-26T05:10:01Z", "cpu": {"time": "2021-01-26T05:10:22Z", "usageNanoCores": 108726826, "usageCoreNanoSeconds": 2009799168}, "memory": {"time": "2021-01-26T05:10:22Z", "usageBytes": 61022208, "workingSetBytes": 52633600, "rssBytes": 32174080, "pageFaults": 45899, "majorPageFaults": 253}}], "startTime": "2020-09-12T12:43:58Z", "cpu": {"time": "2021-01-26T05:10:27Z", "usageNanoCores": 1078216675, "usageCoreNanoSeconds": 1758203828797878}, "memory": {"time": "2021-01-26T05:10:27Z", "availableBytes": 2564227072, "usageBytes": 7543758848, "workingSetBytes": 5632106496, "rssBytes": 3926122496, "pageFaults": 2304667, "majorPageFaults": 859}, "network": {"time": "2021-01-26T05:10:27Z", "name": "eth0", "rxBytes": 166587714846, "rxErrors": 0, "txBytes": 192097080030, "txErrors": 0, "interfaces": [{"name": "eth0", "rxBytes": 166587714846, "rxErrors": 0, "txBytes": 192097080030, "txErrors": 0}]}, "fs": {"time": "2021-01-26T05:10:27Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 26562187264, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 360152}, "runtime": {"imageFs": {"time": "2021-01-26T05:10:27Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 1377648552, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 360152}}, "rlimit": {"time": "2021-01-26T05:10:32Z", "maxpid": 32768, "curproc": 905}}, "pods": [{"podRef": {"name": "etcd-tk01", "namespace": "kube-system", "uid": "2e8885329cb9c936db545fcd71666003"}, "startTime": "2021-01-26T05:10:08Z", "containers": [{"name": "etcd", "startTime": "2021-01-26T05:10:09Z", "cpu": {"time": "2021-01-26T05:10:22Z", "usageNanoCores": 208950881, "usageCoreNanoSeconds": 2582358831}, "memory": {"time": "2021-01-26T05:10:22Z", "usageBytes": 37478400, "workingSetBytes": 37081088, "rssBytes": 36110336, "pageFaults": 11531, "majorPageFaults": 10}, "rootfs": {"time": "2021-01-26T05:10:22Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 36864, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 8}, "logs": {"time": "2021-01-26T05:10:22Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 28672, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 360152}}], "cpu": {"time": "2021-01-26T05:10:24Z", "usageNanoCores": 161540656, "usageCoreNanoSeconds": 34928899852771}, "memory": {"time": "2021-01-26T05:10:24Z", "usageBytes": 71917568, "workingSetBytes": 67014656, "rssBytes": 36155392, "pageFaults": 0, "majorPageFaults": 0}, "network": {}, "ephemeral-storage": {"time": "2021-01-26T05:10:27Z", "availableBytes": 23964233728, "capacityBytes": 52776349696, "usedBytes": 65536, "inodesFree": 2916648, "inodes": 3276800, "inodesUsed": 8}, "process_stats": {"process_count": 0}}]
}

相关文章:

  • 【深尚想】M74VHC1GT08DTT1G逻辑芯片安森美ON 工业/物联网首选 电子元器件解析
  • 基于springboot视频及游戏管理系统+源码+文档
  • Next.js + Supabase = 快速开发 = 高速公路
  • 华为云物联网系统开发(纯云端)外包方案及项目需求说明书
  • 【生产实践】DolphinScheduler集群MySQL数据源切换终极指南|附生产环境避坑手册
  • AbMole| Angiotensin II human(M6240;血管紧张素Ⅱ)
  • 数量关系与资料分析【总结】
  • Android7 Input(十一)App View InputEvent事件分发
  • 全志A33安卓6.0添加支持usb摄像头动态热插拔
  • 【Docker 04】image - 镜像
  • 单片机中面向对象的思维
  • 从零开始学Python(2)——流程控制语句和五种容器
  • 《射频识别(RFID)原理与应用》期末复习 RFID第四章 数据校验和防碰撞算法(知识点总结+习题巩固)
  • js判断手机操作系统(ios、安卓、华为)
  • FastDFS分布式存储
  • web3 资讯网址
  • Web 架构之 Kubernetes 弹性伸缩策略设计
  • 如何将 iPhone 中的短信导出为 PDF
  • C/C++ 面试复习笔记(6)
  • 一[3]、ubuntu18.04环境 利用 yolov8 训练开源列车数据集,并实现列车轨道检测
  • 网站推广国外/百度关键词模拟点击软件
  • 建设b2b2c网站/德州seo优化
  • 做环球资源网站有没有效果/优化大师下载安装
  • wordpress 分类文章排序/长沙网站seo优化排名
  • 开发网站类型/专注于品牌营销服务
  • 成功网站运营案例/湖南网站建站系统哪家好