当前位置: 首页 > news >正文

k8s node 节点加入 matser 错误 cannot construct envvars

问题描述

问题现象

root@master-node1:~# kubectl get pod -A -owide | grep Error
kube-flannel   kube-flannel-ds-cbblv                  0/1     Init:CreateContainerConfigError   0               114m    192.168.31.32   edge-node1     <none>           <none>
kube-system    kube-proxy-zvgz4                       0/1     CreateContainerConfigError        0               114m    192.168.31.32   edge-node1     <none>           <none>
kube-system    raven-agent-ds-xd8j5                   0/1     CreateContainerConfigError        0               114m    192.168.31.32   edge-node1     <none>           <none>

错误日志 kubelet 持续输出

Sep 26 16:56:49 edge-node1 kubelet[16009]: E0926 16:56:49.340163   16009 kuberuntime_manager.go:1449] "Unhandled Error" err="container raven-agent start failed in pod raven-agent-ds-xd8j5_kube-system(15a05897-dcef-47c5-8bbf-a2307c1ef6ef): CreateContainerConfigError: services have not yet been read at least once, cannot construct envvars" logger="UnhandledError"
Sep 26 16:56:49 edge-node1 kubelet[16009]: E0926 16:56:49.340200   16009 pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"raven-agent\" with CreateContainerConfigError: \"services have not yet been read at least once, cannot construct envvars\"" pod="kube-system/raven-agent-ds-xd8j5" podUID="15a05897-dcef-47c5-8bbf-a2307c1ef6ef"
Sep 26 16:56:52 edge-node1 kubelet[16009]: E0926 16:56:52.340762   16009 kuberuntime_manager.go:1449] "Unhandled Error" err="container kube-proxy start failed in pod kube-proxy-zvgz4_kube-system(1f939284-468a-4724-a8db-172260d836dc): CreateContainerConfigError: services have not yet been read at least once, cannot construct envvars" logger="UnhandledError"
Sep 26 16:56:52 edge-node1 kubelet[16009]: E0926 16:56:52.340798   16009 pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CreateContainerConfigError: \"services have not yet been read at least once, cannot construct envvars\"" pod="kube-system/kube-proxy-zvgz4" podUID="1f939284-468a-4724-a8db-172260d836dc"
Sep 26 16:56:54 edge-node1 kubelet[16009]: E0926 16:56:54.341508   16009 kuberuntime_manager.go:1449] "Unhandled Error" err="init container install-cni-plugin start failed in pod kube-flannel-ds-cbblv_kube-flannel(38bad478-9d07-466c-8476-8f71ebf96313): CreateContainerConfigError: services have not yet been read at least once, cannot construct envvars" logger="UnhandledError"
Sep 26 16:56:54 edge-node1 kubelet[16009]: E0926 16:56:54.341537   16009 pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"install-cni-plugin\" with CreateContainerConfigError: \"services have not yet been read at least once, cannot construct envvars\"" pod="kube-flannel/kube-flannel-ds-cbblv" podUID="38bad478-9d07-466c-8476-8f71ebf96313"

排查过程

验证 apiserver

root@edge-node1:/etc/containerd# telnet 192.168.31.61 6443
Trying 192.168.31.61...
Connected to 192.168.31.61.
Escape character is '^]'.

master apiserver 正在监听端口,没什么问题

root@edge-node1:/etc/containerd# curl -k https://192.168.31.61:6443/healthz
ok

验证是否可以请求,也没有问题,就是先排除和 apiserver 的通信问题

查看源码

	// If the pod originates from the kube-api, when we know that the kube-apiserver is responding and the kubelet's credentials are valid.// Knowing this, it is reasonable to wait until the service lister has synchronized at least once before attempting to build// a service env var map.  This doesn't present the race below from happening entirely, but it does prevent the "obvious"// failure case of services simply not having completed a list operation that can reasonably be expected to succeed.// One common case this prevents is a kubelet restart reading pods before services and some pod not having the// KUBERNETES_SERVICE_HOST injected because we didn't wait a short time for services to sync before proceeding.// The KUBERNETES_SERVICE_HOST link is special because it is unconditionally injected into pods and is read by the// in-cluster-config for pod clientsif !kubetypes.IsStaticPod(pod) && !kl.serviceHasSynced() {return nil, fmt.Errorf("services have not yet been read at least once, cannot construct envvars")}

查看源码发现出现这个错误需要满足两个条件

  • 不是静态 pod
  • Service 是否已同步(至少一次)

我们在这里可以确定,肯定不是静态 pod,所以我们需要针对条件 2 来规避返回这个错误。

ExecStartPre=/bin/sleep 30

添加到 kubelet systemd 中,然后从起 kubelet


systemctl daemon-reload
systemctl restart kubelet

发现没有起作用,还有其他办法

--sync-frequency=30s  # 增加同步频率检查间隔

添加到 kueblet 的 config 配置中,重启之后发现还是没有解决,到这里感觉可能不是 kubelet 的问题。

解决方案

最后的最后僵持了一个下午,在 flanne issues 中发现了相似的问题,issue 里面描述的大致内容是把 k8s 升级到 v1.30.3 -> v1.31.0 后遭遇到了这个问题,原因是 k8s 版本和 flannel 之间不兼容造成的问题。

  • https://github.com/flannel-io/flannel/issues/2031

因为我的集群是 v1.34.0 并且使用的也是 flannel 所以抱着死马当活马医的态度,我重置了集群并且安装了低版本,发现果然没这个问题

root@master-node1:~# kubectl get pod -A -owide
NAMESPACE      NAME                                   READY   STATUS    RESTARTS   AGE    IP              NODE           NOMINATED NODE   READINESS GATES
kube-flannel   kube-flannel-ds-qg7pp                  1/1     Running   0          103m   192.168.31.32   edge-node1     <none>           <none>
kube-flannel   kube-flannel-ds-t95vl                  1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    coredns-7db6d8ff4d-dppqf               1/1     Running   0          15h    10.244.0.5      master-node1   <none>           <none>
kube-system    coredns-7db6d8ff4d-gslxm               1/1     Running   0          15h    10.244.0.6      master-node1   <none>           <none>
kube-system    etcd-master-node1                      1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    kube-apiserver-master-node1            1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    kube-controller-manager-master-node1   1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    kube-proxy-6fltr                       1/1     Running   0          103m   192.168.31.32   edge-node1     <none>           <none>
kube-system    kube-proxy-kkc2j                       1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    kube-scheduler-master-node1            1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    raven-agent-ds-5df4d                   1/1     Running   0          103m   192.168.31.32   edge-node1     <none>           <none>
kube-system    raven-agent-ds-6xv5h                   1/1     Running   0          15h    192.168.31.61   master-node1   <none>           <none>
kube-system    yurt-hub-edge-node1                    1/1     Running   1          102m   192.168.31.32   edge-node1     <none>           <none>
kube-system    yurt-manager-96c7d9484-5t5g5           1/1     Running   0          15h    10.244.0.7      master-node1   <none>           <none>

结论

  • k8s v1.31.0 API 变更:新版本对 service selector 和环境变量构建方式进行了改动
  • ghcr.io/flannel-io/flannel-cni-plugin:v1.7.1-flannel1 插件在新版本存在兼容性问题

修复方法

  • k8s 降级

or

  • 使用其他版本的 flannel 最起码要支持新特性

二选一

http://www.dtcms.com/a/415621.html

相关文章:

  • 做个自己的网站需要多少钱桂林网络平台开发公司
  • 网站的建设背景临沂seo推广
  • 网站建设合同免费下载做网页公司有哪些
  • 【TS6】Cherry Studio项目介绍
  • 自己创免费网站网站建设人员构成
  • 永远网站建设推荐商城网站建设
  • 【探寻C++之旅】第十五章:哈希表
  • 线程的生命周期
  • 网站建设男装定位微信公众平台注册官网入口
  • 廊坊seo网站排名ui设计培训班有用吗
  • 公司网站定位建议广州手工活外发加工网
  • 高光谱成像在壁画研究分析以及保护的应用
  • 从试水到普及,AI 通识课全面爆发
  • TCP/IP 的韧性:尽力而为 可靠传输协议
  • 与做网站的人怎么谈判网页设计与制作的原则
  • 咸宁网站建设价格有没有专做于投融资的网站
  • 门户网站建设和运行招标公告如何在第三方网站做推广
  • 南宁网站建设找哪家wordpress 技术类主题
  • 如何使用Python压缩和解压文件
  • 小迪web自用笔记40
  • 奇异值分解(Singular Value Decomposition, SVD)详解——从特征值到奇异值
  • 免费网站安全检测网络游戏名字大全
  • [冀信2025]雄
  • 住建部城乡建设网站中国核工业第二二建设有限公司地址
  • 卓手机建网站查询网站旗下域名
  • DevEco Studio 预览器的使用
  • jar包Tls检验问题处理
  • 网站总体设计方案优秀企业网站的特点
  • PyTorch 实现 CIFAR10 图像分类知识点总结
  • 商城维护工作内容网站建设wordpress 插件站