Kubernetes集群部署实战:从零到英雄
一、环境准备与规划
1.1 硬件与软件要求
最低配置:
- 控制节点:2核CPU/4GB内存/50GB磁盘
- 工作节点:2核CPU/4GB内存/100GB磁盘
- 操作系统:Ubuntu 20.04/CentOS 7.9+ 网络规划:
bash复制
# 节点IP分配示例
192.168.1.100 master1
192.168.1.101 worker1
192.168.1.102 worker2
# Pod网络CIDR:10.244.0.0/16
# Service网络CIDR:10.96.0.0/1
1.2 系统初始化配置
bash复制
# 所有节点执行
sudo swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
sudo systemctl stop firewalld && systemctl disable firewalld
sudo setenforce 0 && sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
# 设置主机名解析
sudo hostnamectl set-hostname master1
echo "192.168.1.100 master1" | sudo tee -a /etc/hosts
echo "192.168.1.101 worker1" | sudo tee -a /etc/hosts
二、Kubernetes组件安装
2.1 安装Docker容器运行时
bash复制
# 所有节点执行
sudo apt-get update && sudo apt-get install -y docker.io
sudo systemctl enable docker && systemctl start docker
# 配置镜像加速
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://registry.cn-hangzhou.aliyuncs.com"]
}
EOF
sudo systemctl restart docker
2.2 安装kubeadm/kubelet/kubectl
bash复制
# 添加Kubernetes源
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
sudo tee /etc/apt/sources.list.d/kubernetes.list <<-'EOF'
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
# 安装组件
sudo apt-get update
sudo apt-get install -y kubelet=1.25.0-00 kubeadm=1.25.0-00 kubectl=1.25.0-00
sudo apt-mark hold kubelet kubeadm kubectl
三、集群初始化与控制平面部署
3.1 初始化Master节点
bash复制
sudo kubeadm init \
--apiserver-advertise-address=192.168.1.100 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.25.0 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
# 配置kubectl访问权限
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
3.2 加入Worker节点
bash复制
# 在Worker节点执行初始化后生成的命令
kubeadm join 192.168.1.100:6443 \
--token 3zgbqz.5g7v4l8m3d9w6s2o \
--discovery-token-ca-cert-hash sha256:4f2c6d3...(实际哈希值)
四、网络插件与存储配置
4.1 安装Calico网络
bash复制
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# 验证Pod网络
kubectl get pods -n kube-system -l k8s-app=calico-node
4.2 配置NFS持久化存储
bash复制
# 存储服务器执行
sudo apt install nfs-kernel-server
sudo mkdir /nfsdata && chmod 777 /nfsdata
echo "/nfsdata *(rw,sync,no_root_squash)" | sudo tee /etc/exports
sudo systemctl restart nfs-server
# Kubernetes节点执行
sudo apt install nfs-common
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-storage
provisioner: example.com/nfs
parameters:
server: 192.168.1.200
path: /nfsdata
EOF
五、监控与日志系统集成
5.1 部署Prometheus+Grafana
bash复制
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword=admin123
5.2 配置EFK日志收集
yaml复制
# 部署Elasticsearch
kubectl apply -f https://download.elastic.co/downloads/eck/2.6.1/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.6.1/operator.yaml
# 部署Fluentd DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
spec:
template:
spec:
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.16-debian-elasticsearch7
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
六、集群安全加固
6.1 RBAC权限控制
yaml复制
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
6.2 网络安全策略
yaml复制
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
七、集群维护与优化
7.1 集群升级操作
bash复制
# 升级kubeadm
sudo apt-get update && sudo apt-get install kubeadm=1.26.0-00
kubeadm upgrade plan
kubeadm upgrade apply v1.26.0
# 升级节点
kubectl drain worker1 --ignore-daemonsets
sudo apt-get install kubelet=1.26.0-00 kubectl=1.26.0-00
sudo systemctl restart kubelet
kubectl uncordon worker1
7.2 资源调度优化
yaml复制
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values: ["ssd"]
八、高可用方案实现
8.1 控制平面高可用
bash复制
# 使用kubeadm部署多Master节点
kubeadm init --control-plane-endpoint "LOAD_BALANCER_IP:6443" \
--upload-certs
# 其他Master节点加入
kubeadm join LOAD_BALANCER_IP:6443 --token ... \
--control-plane --certificate-key ...
8.2 工作节点自动扩展
yaml复制
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
九、故障排查指南
9.1 常见问题分析
故障现象 | 排查命令 | 解决方案 |
---|---|---|
Pod处于Pending状态 | kubectl describe pod <pod-name> | 检查资源配额和节点调度 |
服务无法访问 | kubectl get endpoints | 验证Service与Pod标签匹配 |
节点NotReady | journalctl -u kubelet | 检查kubelet日志和网络连通性 |
9.2 诊断工具箱
bash复制
# 查看集群事件
kubectl get events --sort-by='.metadata.creationTimestamp'
# 网络连通性测试
kubectl run -it --rm debug --image=nicolaka/netshoot -- /bin/bash
# 性能分析工具
kubectl top nodes
kubectl top pods
总结与进阶建议
通过本指南,您已完成从零开始构建生产级Kubernetes集群的全过程。建议持续关注以下方向:
- 服务网格:集成Istio实现微服务治理
- GitOps实践:使用Argo CD实现持续部署
- 混合云管理:探索Cluster API跨云管理集群