K8S的CNI之calico插件升级至3.30.2
前言
宿主机ping不通K8S的pod,一直存在丢包的现象,排查了防火墙、日志、详细信息等没发现什么问题,最后搜索发现,是因为把K8S的版本升级之后,旧版本的CNI插件不适配原因导致的,于是就把calico也一并升级并且记录下来。本文K8S版本为1.32.6,calico版本为3.30.2。
一、删除旧版本的CNI名称空间
因为这样可以直接删除所有资源。
建议大家还是直接使用kubectl delete -f xxx.yaml或者是delete资源。
这里我直接使用etcdctl命令行工具删除了。(不要轻易模仿)
# 安装命令行工具[root@k8s-master ~]# apt install -y etcd# 过滤出我们的名称空间的信息
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key get / --prefix --keys-only | grep namespace# 找到calico所对应的名称空间,进行删除操作
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key del /registry/namespaces/calico-apiserver[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key del /registry/namespaces/calico-system
当然当然当然,删除pods也只需要把grep后面的信息修改为pod,然后del就行了!
二、重置coredns组件
[root@k8s-master ~]# kubectl delete po -n kube-system -l k8s-app=kube-dns
三、下载新版本的CNI插件
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yamlwget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml
四、清空Finalizer
1. 查看当前状态(可选)
确认它确实在删除中,并带有哪些 finalizer:
kubectl get installation default -n calico-system -o yaml | grep -E 'deletionTimestamp|finalizers' -C2
2. 移除所有 finalizer
执行下面的 patch 命令,将 metadata.finalizers
数组清空:
kubectl patch installation default -n calico-system \--type=merge \-p '{"metadata":{"finalizers":[]}}'
3. 确认删除标记被清除
再次查看资源,应该不再有 deletionTimestamp
,也不再有 finalizer:
kubectl get installation default -n calico-system -o yaml | grep -E 'deletionTimestamp|finalizers' -C2
这样 Installation/default
就会从“删除中”回到正常状态,Operator 也会继续对它进行管理。
五、部署新版本的CNI插件
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/custom-resources.yaml
这里需要注意,记得修改pod的cidr网段!(之前初始化K8S集群的时候设置的pod网段)
[root@k8s-master ~]# grep -C 2 'cidr' custom-resources.yaml - name: default-ipv4-ippoolblockSize: 26cidr: 10.100.0.0/16encapsulation: VXLANCrossSubnetnatOutgoing: Enabled
完成了之后在apply创建资源
[root@k8s-master ~]# kubectl apply -f custom-resources.yaml
等待资源创建成功
[root@k8s-master ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver calico-apiserver-67fd59c4cb-p8qm8 1/1 Running 0 6m54s
calico-apiserver calico-apiserver-67fd59c4cb-v6qst 1/1 Running 0 2m47s
calico-system calico-kube-controllers-598d796659-5pnz9 1/1 Running 0 6m51s
calico-system calico-node-4nfmc 1/1 Running 0 6m52s
calico-system calico-node-n8l9r 1/1 Running 0 6m52s
calico-system calico-node-rqcsp 1/1 Running 0 6m52s
calico-system calico-typha-76846cfc98-fzsq4 1/1 Running 0 6m50s
calico-system calico-typha-76846cfc98-s9v7k 1/1 Running 0 6m52s
calico-system csi-node-driver-l9l6m 2/2 Running 0 6m52s
calico-system csi-node-driver-v6f7m 2/2 Running 0 6m52s
calico-system csi-node-driver-xrd4m 2/2 Running 0 6m52s
calico-system goldmane-5f56496f4c-69p7x 1/1 Running 0 6m52s
calico-system whisker-85957d9c7b-ckxw7 2/2 Running 0 6m52s
kube-system coredns-6766b7b6bb-lqmgs 1/1 Running 0 8s
kube-system coredns-6766b7b6bb-qqhg8 1/1 Running 0 72s
kube-system etcd-k8s-master 1/1 Running 3 (137m ago) 173m
kube-system kube-apiserver-k8s-master 1/1 Running 4 (135m ago) 173m
kube-system kube-controller-manager-k8s-master 1/1 Running 3 (137m ago) 173m
kube-system kube-proxy-7rg9h 1/1 Running 1 (137m ago) 138m
kube-system kube-proxy-tkpgx 1/1 Running 1 (137m ago) 138m
kube-system kube-proxy-vpjcw 1/1 Running 1 (137m ago) 138m
kube-system kube-scheduler-k8s-master 1/1 Running 3 (137m ago) 173m
tigera-operator tigera-operator-747864d56d-9bdfv 1/1 Running 0 7m3s
查看版本信息
[root@k8s-master ~]# kubectl -n calico-system get daemonset calico-node -o jsonpath="{.spec.template.spec.containers[0].image}";echo
docker.io/calico/node:v3.30.2
六、验证集群与CNI可用性
1. 创建Pod
[root@k8s-master ~]# cat test-cni.yaml
apiVersion: v1
kind: Pod
metadata:name: xiuxian-v1
spec:containers:- image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1 name: xiuxian
2. 验证
[root@k8s-master ~]# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xiuxian-v1 1/1 Running 0 9s 10.100.169.135 k8s-node2 <none> <none>
[root@k8s-master ~]# curl 10.100.169.135
<!DOCTYPE html>
<html><head><meta charset="utf-8"/><title>yinzhengjie apps v1</title><style>div img {width: 900px;height: 600px;margin: 0;}</style></head><body><h1 style="color: green">凡人修仙传 v1 </h1><div><img src="1.jpg"><div></body></html>
3. ping测试
[root@k8s-master ~]# ping 10.100.169.135
PING 10.100.169.135 (10.100.169.135) 56(84) bytes of data.
64 bytes from 10.100.169.135: icmp_seq=1 ttl=63 time=0.509 ms
64 bytes from 10.100.169.135: icmp_seq=2 ttl=63 time=0.374 ms
^C
--- 10.100.169.135 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.374/0.441/0.509/0.067 ms
通过以上排查和调整,大多数 Calico CNI 相关网络问题都能迎刃而解。希望这篇技术博文能为你和同样在生产环境中苦于网络抖动的同行,提供一份全面的排障指南。