当前位置: 首页 > news >正文

Kubernetes 入门篇之Master节点部署与安装

在安装k8s master节点的过程中遇到了很多问题,简单记录一下全过程。

1. 操作系统

 服务器操作系统: Kylin-Server-V10-SP3-General-Release-2303-ARM64.iso

2. 搭建前准备

2.1 关闭防火墙

  systemctl stop firewalld
  systemctl disable firewalld

2.2 关闭交换分区

swapoff -a       // 临时关闭swap
sed -i  's/.*swap.*/#&/' /etc/fstab    //永久关闭

2.3 关闭Selinux

setenforce 0    //临时关闭
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config  //永久禁用
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config  // permissive也相当于禁用

2.4 配置k8s 国内源

tee /etc/yum.repos.d/kubernetes.repo  << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-aarch64/
enabled=1
gpgcheck=0
exclude=kube*
EOF

yum makecache 更新源缓存

2.5 启用 IPv4 数据包转发 和 iptables 网络过滤

// 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

//应用 sysctl 参数而不重新启动
sudo sysctl --system

使用以下命令验证 net.ipv4.ip_forward 是否设置为 1:
sysctl net.ipv4.ip_forward

3. 安装

3.1 安装高版本的go

因为 编译 containerd 依赖高版本的go

wget https://go.dev/dl/go1.24.1.linux-arm64.tar.gz
tar -zxvf go1.24.1 linux-arm64.tar.gz -C /opt/
echo "export PATH=$PATH:/opt/go/bin" >> ~/.bashrc
source ~/.bashrc

3.2 安装CRI

因为k8s 需要安装CRIcontainer runtime interface), 官方文档推荐的有4种,这里选择containerd

containerd
CRI-O
Docker Engine
Mirantis Container Runtime

源码下载地址: https://github.com/containerd/containerd/tree/release/2.0

下载:

git clone https://github.com/containerd/containerd/tree/release/2.0

编译:

cd contained
make -j4
makefile 文件中默认的安装路径是 /usr/local

安装:

make install 

作为服务启动:

cp containerd.service /etc/systemd/system/
systemctl daemon-load
systemctl enable containerd.service --now

查看服务状态:

systemctl status containerd

修改默认的配置文件:

mkdir -p /etc/containerd
containerd config default | sudo  tee /etc/containerd/config.toml

  1. 修改镜像下载地址, 为什么要修改后续会讲到:
vim /etc/containerd/config.toml
将
 [plugins.'io.containerd.cri.v1.images'.pinned_images]
      sandbox = 'registry.k8s.io/pause:3.10'
改为:
 [plugins.'io.containerd.cri.v1.images'.pinned_images]
    sandbox =  registry.aliyuncs.com/google_containers/pause:3.10'
  1. kubeadm 默认设置cgroup 驱动(cgroupDriver)为"systemd",建议将containerdcgroup驱动也修改为"systemd",与kubernetes保持一致。

添加 SystemdCgroup = true

  [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
     ..............
    SystemdCgroup = true
  1. 修改grpccontainerd.sock 路径
[grpc]
  address = '/run/containerd/containerd.sock'
  ....
改为:
[grpc]
  address = '/var/run/containerd/containerd.sock'
  ....

查看containerd 版本

[root@localhost ~]# ctr version
Client:
  Version:  v2.0.0.m
  Revision: 207ad711eabd375a01713109a8a197d197ff6542.m
  Go version: go1.24.1

Server:
  Version:  v2.0.0.m
  Revision: 207ad711eabd375a01713109a8a197d197ff6542.m
  UUID: dfeb6a64-2353-4fa3-af42-6482a77285e7

查看 containerd 配置中的 CRI 插件状态

sudo containerd config dump | grep "disable ="
disable = false  # 表示 CRI 插件已启用

3.3 安装runc

containerd 运行时容器使用的runc, 不安装这个部署时会报错 3.12中的错误 ,先安装依赖包 libseccomp-devel,不安装这个后面会报错。

git clone https://github.com/opencontainers/runc.git 
cd runc
make BUILDTAGS="selinux seccomp" 
sudo cp runc /usr/bin/runc

3.4 安装kubeadm, kubelet ,kubectl

yum install -y kubeadm kubelet kubectl --disableexclude=kubernetes

启动 kubelet 服务
systemctl enable --now kubelet

查看kubelet 状态
systemctl status kubelet   //这时kubelet 服务起不来

报错 1:

"command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml

解决办法,重新生成配置文件:

sudo kubeadm init phase certs all        # 生成证书
sudo kubeadm init phase kubeconfig all   # 生成 kubeconfig 文件
sudo kubeadm init phase kubelet-start --config k8s.yaml

报错2:

kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443:

这个暂时不用管,这是kube-apiserver 服务还没安装,6443 端口是kube-apiserver的,这个需要执行kubeadm init 才会安装这个kubelet的端口是10250

3.5 拉取kubernetes镜像前执行检查

kubeadm init phase prelight

报错1:

[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist  

这是因为内核没有加载br_netfilter模块导致,执行 modprobe br_netfilter加载内核

3.6 生成默认配置文件

kubeadm config print init-defaults  > k8s.yaml

3.7 修改默认配置文件k8s.yaml


1. apiserver 修改成允许外部的IP地址,如本机IP
	advertiseAddress: 1.2.3.4, 如 192.168.30.3

2. 修改镜像仓库的地址,不然3.9步骤会拉取镜像失败:
	imageRepository:registry.k8s.io 
	改为:
	imageRepository: registry.aliyuncs.com/google_containers 
	
3. 修改kubernetes版本
	kubernetesVersion: 1.29.0

3.8 查看镜像列表

kubeadm config images list   --config=./k8s.yaml

3.9 拉取镜像到本地

kubeadm config images pull --config=./k8s.yaml

报错2:

failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0": output: E0331 19:39:59.204202   24003 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\"" image="registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0"
time="2025-03-31T19:39:59+08:00" level=fatal msg="pulling image: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\

报没有 /var/run/containerd/containerd.sock 文件
ls /var/run/containerd/containerd.sock 确实没有这个

排查过程:

1. 执行 crictl info 
报:WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
E0331 19:43:12.195880   25816 remote_runtime.go:616] "Status from runtime service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\""

从kubernetes 1.24 版本开始,就已经抛弃 dockershim.sock, 改用/containerd.sock了,但是这里还是使用的dockershim.sock

解决办法:
错误信息表明 crictl 默认尝试连接已弃用的 Docker 的 dockershim.sock,但实际使用的是 containerd 。这是因为 crictl 未显式配置容器运行时端点(Endpoint),导致其尝试连接无效的路径。

创建或修改 /etc/crictl.yaml 文件,明确指定容器运行时 Socket 路径:

sudo tee /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF

再次查看crictl info 没有报错,NetworkReadyfalse,是还没有安装网络插件CNI, 这个后面安装

[root@localhost ~]# crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": false,
        "reason": "NetworkPluginNotReady",
        "message": "Network plugin returns error: cni plugin not initialized"
      },

再次拉取镜像成功

[root@localhost ~]# kubeadm config images pull  --config=./k8s/k8s.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.9-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.10.1

3.10 部署

kubeadm init --config=./k8s/k8s.yaml

报错3 :

wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

查看 kubelet 服务状态 systemctl status kubelet
报:

401 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.129417  868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node_kube-system(0e97ccaf4fad68a1e1b53>
401 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.674951  868547 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"node\" not found"
401 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.180384  868547 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://10.0.0.241:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/le>
401 22:21:11 localhost.localdomain kubelet[868547]: I0401 22:21:11.649620  868547 kubelet_node_status.go:70] "Attempting to register node" node="node"
401 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.650110  868547 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443: connect:>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099033  868547 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b3>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099126  868547 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e53>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099172  868547 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e5>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099288  868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-node_kube-system(19a346d941e8454735afc5705981ecc1)\" with>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.637856  868547 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"node.18323734c87d6946", Genera>

这就是containerd 没修改 sandbox 源导致的。

3.11 部署失败后重置 再次部署

kubeadm reset 
后再次执行
kubeadm init --config=./k8s/k8s.yaml   部署成功

...............
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.0.241:6443 --token abcdef.0123456789abcdef \
	--discovery-token-ca-cert-hash sha256:318ab9558b98ebad6ef231117618558782915bef105494281ab8639054067a11 

3.12 其他错误

  1. 没有 runc 导致
RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to start sandbox \"38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081\": failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH"

相关文章:

  • 基于modbusTcp连接Modbus Slave本地模拟服务通讯(C#编写ModbusTcp类库)(一)
  • VMware Workstation Pro下载链接
  • 【图像去噪】论文复现:灵感源自MAE!进一步解决BSN的局限性,破坏真实噪声的空间相关性!AMSNet的Pytorch源码复现,跑通源码,原理详解!
  • SQL Server:数据库镜像端点检查
  • 图解AUTOSAR_SWS_CANStateManager
  • STM32 FATFS - 在spi的SD卡中运行fatfs
  • 招标采购管理系统智能化亮点应用场景举例
  • 基于Spring Boot的平面设计课程在线学习平台系统的设计与实现(LW+源码+讲解)
  • MySQL五十题
  • C 语言测验
  • 《Linux运维总结:基于银河麒麟V10+ARM64架构CPU二进制部署单实例rabbitmq3.10.25》
  • windows使用nvm管理node版本
  • 3. 费曼学习法?
  • 性能比拼: Pingora vs Nginx (My NEW Favorite Proxy)
  • 蓝桥杯 刷题对应的题解
  • stm32 can 遥控帧的问题
  • 全局曝光与卷帘曝光
  • 海康摄像头通过Web插件进行预览播放和控制
  • IP 地址规划中的子网划分:/18 网络容纳 64 个 C 段(/24)的原理与应用解析
  • SpringCould微服务架构之Docker(8)
  • “苏河超级管”调研:桥下公园“留白”很好,指引差点
  • 东方红资管官宣:41岁原国信资管董事长成飞出任新总经理
  • 纽约大学朗格尼医学中心的转型带来哪些启示?
  • 第32届梅花奖终评启幕,上海京剧院《智取威虎山》满堂彩
  • 从“重规模”向“重回报”转变,公募基金迎系统性改革
  • 消费者在天猫一旗舰店换手机电池疑遭套路致手机损坏,平台已介入