当前位置：首页 > news >正文

Kubernetes 入门篇之Master节点部署与安装

news 2025/10/30 3:15:54

在安装k8s master节点的过程中遇到了很多问题，简单记录一下全过程。

1. 操作系统

 服务器操作系统： Kylin-Server-V10-SP3-General-Release-2303-ARM64.iso

2. 搭建前准备

2.1 关闭防火墙

  systemctl stop firewalld
  systemctl disable firewalld

2.2 关闭交换分区

swapoff -a       // 临时关闭swap
sed -i  's/.*swap.*/#&/' /etc/fstab    //永久关闭

2.3 关闭Selinux

setenforce 0    //临时关闭
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config  //永久禁用
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config  // permissive也相当于禁用

2.4 配置k8s 国内源

tee /etc/yum.repos.d/kubernetes.repo  << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-aarch64/
enabled=1
gpgcheck=0
exclude=kube*
EOF

yum makecache 更新源缓存

2.5 启用 IPv4 数据包转发和 iptables 网络过滤

// 设置所需的 sysctl 参数，参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

//应用 sysctl 参数而不重新启动
sudo sysctl --system

使用以下命令验证 net.ipv4.ip_forward 是否设置为 1：
sysctl net.ipv4.ip_forward

3. 安装

3.1 安装高版本的go

因为编译 containerd 依赖高版本的go

wget https://go.dev/dl/go1.24.1.linux-arm64.tar.gz
tar -zxvf go1.24.1 linux-arm64.tar.gz -C /opt/
echo "export PATH=$PATH:/opt/go/bin" >> ~/.bashrc
source ~/.bashrc

3.2 安装CRI

因为k8s 需要安装CRI（container runtime interface）, 官方文档推荐的有4种，这里选择containerd

containerd
CRI-O
Docker Engine
Mirantis Container Runtime

源码下载地址： https://github.com/containerd/containerd/tree/release/2.0

下载：

git clone https://github.com/containerd/containerd/tree/release/2.0

编译：

cd contained
make -j4
makefile 文件中默认的安装路径是 /usr/local

安装：

make install

作为服务启动：

cp containerd.service /etc/systemd/system/
systemctl daemon-load
systemctl enable containerd.service --now

查看服务状态：

systemctl status containerd

修改默认的配置文件：

mkdir -p /etc/containerd
containerd config default | sudo  tee /etc/containerd/config.toml

修改镜像下载地址, 为什么要修改后续会讲到：

vim /etc/containerd/config.toml
将
 [plugins.'io.containerd.cri.v1.images'.pinned_images]
      sandbox = 'registry.k8s.io/pause:3.10'
改为：
 [plugins.'io.containerd.cri.v1.images'.pinned_images]
    sandbox =  registry.aliyuncs.com/google_containers/pause:3.10'

kubeadm 默认设置cgroup 驱动（cgroupDriver）为"systemd"，建议将containerd的cgroup驱动也修改为"systemd"，与kubernetes保持一致。

添加 SystemdCgroup = true

  [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
     ..............
    SystemdCgroup = true

修改grpc 中 containerd.sock 路径

[grpc]
  address = '/run/containerd/containerd.sock'
  ....
改为：
[grpc]
  address = '/var/run/containerd/containerd.sock'
  ....

查看containerd 版本

[root@localhost ~]# ctr version
Client:
  Version:  v2.0.0.m
  Revision: 207ad711eabd375a01713109a8a197d197ff6542.m
  Go version: go1.24.1

Server:
  Version:  v2.0.0.m
  Revision: 207ad711eabd375a01713109a8a197d197ff6542.m
  UUID: dfeb6a64-2353-4fa3-af42-6482a77285e7

查看 containerd 配置中的 CRI 插件状态

sudo containerd config dump | grep "disable ="
disable = false  # 表示 CRI 插件已启用

3.3 安装runc

containerd 运行时容器使用的runc，不安装这个部署时会报错 3.12中的错误，先安装依赖包 libseccomp-devel，不安装这个后面会报错。

git clone https://github.com/opencontainers/runc.git 
cd runc
make BUILDTAGS="selinux seccomp" 
sudo cp runc /usr/bin/runc

3.4 安装kubeadm, kubelet ,kubectl

yum install -y kubeadm kubelet kubectl --disableexclude=kubernetes

启动 kubelet 服务
systemctl enable --now kubelet

查看kubelet 状态
systemctl status kubelet   //这时kubelet 服务起不来

报错 1：

"command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml

解决办法，重新生成配置文件：

sudo kubeadm init phase certs all        # 生成证书
sudo kubeadm init phase kubeconfig all   # 生成 kubeconfig 文件
sudo kubeadm init phase kubelet-start --config k8s.yaml

报错2：

kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443:

这个暂时不用管，这是kube-apiserver 服务还没安装，6443 端口是kube-apiserver的，这个需要执行kubeadm init 才会安装这个kubelet的端口是10250

3.5 拉取kubernetes镜像前执行检查

kubeadm init phase prelight

报错1：

[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist  

这是因为内核没有加载br_netfilter模块导致，执行 modprobe br_netfilter加载内核

3.6 生成默认配置文件

kubeadm config print init-defaults  > k8s.yaml

3.7 修改默认配置文件k8s.yaml


1. apiserver 修改成允许外部的IP地址，如本机IP
	advertiseAddress: 1.2.3.4， 如 192.168.30.3

2. 修改镜像仓库的地址，不然3.9步骤会拉取镜像失败：
	imageRepository:registry.k8s.io 
	改为：
	imageRepository: registry.aliyuncs.com/google_containers 
	
3. 修改kubernetes版本
	kubernetesVersion: 1.29.0

3.8 查看镜像列表

kubeadm config images list   --config=./k8s.yaml

3.9 拉取镜像到本地

kubeadm config images pull --config=./k8s.yaml

报错2：

failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0": output: E0331 19:39:59.204202   24003 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\"" image="registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0"
time="2025-03-31T19:39:59+08:00" level=fatal msg="pulling image: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\

报没有 /var/run/containerd/containerd.sock 文件
ls /var/run/containerd/containerd.sock 确实没有这个

排查过程：

1. 执行 crictl info 
报：WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
E0331 19:43:12.195880   25816 remote_runtime.go:616] "Status from runtime service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\""

从kubernetes 1.24 版本开始，就已经抛弃 dockershim.sock, 改用/containerd.sock了，但是这里还是使用的dockershim.sock

解决办法：
错误信息表明 crictl 默认尝试连接已弃用的 Docker 的 dockershim.sock，但实际使用的是 containerd 。这是因为 crictl 未显式配置容器运行时端点（Endpoint），导致其尝试连接无效的路径。

创建或修改 /etc/crictl.yaml 文件，明确指定容器运行时 Socket 路径：

sudo tee /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF

再次查看crictl info 没有报错，NetworkReady为false，是还没有安装网络插件CNI, 这个后面安装

[root@localhost ~]# crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": false,
        "reason": "NetworkPluginNotReady",
        "message": "Network plugin returns error: cni plugin not initialized"
      },

再次拉取镜像成功

[root@localhost ~]# kubeadm config images pull  --config=./k8s/k8s.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.9-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.10.1

3.10 部署

kubeadm init --config=./k8s/k8s.yaml

报错3 ：

wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

查看 kubelet 服务状态 systemctl status kubelet
报：

4月 01 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.129417  868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node_kube-system(0e97ccaf4fad68a1e1b53>
4月 01 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.674951  868547 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"node\" not found"
4月 01 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.180384  868547 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://10.0.0.241:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/le>
4月 01 22:21:11 localhost.localdomain kubelet[868547]: I0401 22:21:11.649620  868547 kubelet_node_status.go:70] "Attempting to register node" node="node"
4月 01 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.650110  868547 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443: connect:>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099033  868547 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b3>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099126  868547 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e53>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099172  868547 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e5>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099288  868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-node_kube-system(19a346d941e8454735afc5705981ecc1)\" with>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.637856  868547 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"node.18323734c87d6946", Genera>

这就是containerd 没修改 sandbox 源导致的。

3.11 部署失败后重置再次部署

kubeadm reset 
后再次执行
kubeadm init --config=./k8s/k8s.yaml   部署成功

...............
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.0.241:6443 --token abcdef.0123456789abcdef \
	--discovery-token-ca-cert-hash sha256:318ab9558b98ebad6ef231117618558782915bef105494281ab8639054067a11

3.12 其他错误

没有 runc 导致

RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to start sandbox \"38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081\": failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH"

查看全文

http://www.dtcms.com/a/105699.html