一步一步在Kubernetes集群部署NVIDIA KAI Scheduler
1. 检查并满足前提条件
1.1 检查Kubernetes版本
kubectl version --short
确保输出的Server Version中Major.Minor >= 1.24,例如v1.24.0或更高版本。
1.2 验证GPU节点
kubectl get nodes -o wide
确认集群中有至少一个节点配备NVIDIA GPU。
1.3 验证GPU资源识别
kubectl describe nodes | grep -A 5 "Allocatable" | grep "nvidia.com/gpu"
如果输出类似于nvidia.com/gpu: 2
,表示GPU资源已被正确识别。如果没有输出,请先安装NVIDIA GPU Operator:
# 安装NVIDIA GPU Operator(如果尚未安装)
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator
1.4 安装Helm 3
# 对于Ubuntu/Debian系统
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm# 对于RHEL/CentOS系统
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh# 验证Helm安装
helm version
2. 部署NVIDIA KAI Scheduler
2.1 添加KAI Scheduler Helm仓库
helm repo add nvidia https://nvidia.github.io/kai-scheduler
helm repo update
2.2 创建专用命名空间
kubectl create namespace kai-scheduler
2.3 查看可用的Helm Chart版本
helm search repo nvidia/kai-scheduler --versions
2.4 部署特定版本的KAI Scheduler
# 选择一个与K8s版本兼容的版本,例如v0.4.0
helm install kai-scheduler nvidia/kai-scheduler \--namespace kai-scheduler \--version v0.4.0 \--set scheduler.name=kai-scheduler \--set scheduler.priority=10 \--set features.gpuSharing.enabled=true \--set features.fairness.enabled=true \--set features.gangScheduling.enabled=true
2.5 检查部署状态
# 查看部署
kubectl get deployments -n kai-scheduler# 查看Pod状态
kubectl get pods -n kai-scheduler# 查看Pod详细信息(如果有问题)
kubectl describe pod -n kai-scheduler <pod-name># 查看日志
kubectl logs -n kai-scheduler deployment/kai-scheduler -f
等待几分钟,直到Pod状态变为Running。
3. 配置Kubernetes使用KAI Scheduler
3.1 创建测试GPU Pod配置文件
cat << EOF > gpu-test-pod.yaml
apiVersion: v1
kind: Pod
metadata:name: gpu-test-pod
spec:schedulerName: kai-schedulercontainers:- name: gpu-containerimage: nvidia/cuda:12.1.1-runtime-ubuntu22.04command: ["sleep", "3600"]resources:limits:nvidia.com/gpu: 1
EOF
3.2 部署测试Pod
kubectl apply -f gpu-test-pod.yaml
3.3 验证Pod调度情况
# 查看Pod状态
kubectl get pods gpu-test-pod# 检查调度器信息
kubectl describe pod gpu-test-pod | grep "Scheduler"# 查看Pod运行在哪个节点
kubectl get pod gpu-test-pod -o wide
确认输出中显示的调度器是kai-scheduler
。
4. 测试GPU共享功能
4.1 创建两个共享GPU的Pod
cat << EOF > gpu-sharing-test.yaml
apiVersion: v1
kind: Pod
metadata:name: gpu-share-pod-1
spec:schedulerName: kai-schedulercontainers:- name: gpu-containerimage: nvidia/cuda:12.1.1-runtime-ubuntu22.04command: ["sleep", "3600"]resources:limits:nvidia.com/gpu: 0.5
---
apiVersion: v1
kind: Pod
metadata:name: gpu-share-pod-2
spec:schedulerName: kai-schedulercontainers:- name: gpu-containerimage: nvidia/cuda:12.1.1-runtime-ubuntu22.04command: ["sleep", "3600"]resources:limits:nvidia.com/gpu: 0.5
EOF
4.2 部署共享GPU的Pod
kubectl apply -f gpu-sharing-test.yaml
4.3 验证共享调度
# 检查两个Pod是否都在运行
kubectl get pods gpu-share-pod-1 gpu-share-pod-2# 检查它们是否被调度到同一个节点
kubectl get pods gpu-share-pod-1 gpu-share-pod-2 -o wide
两个Pod应该运行在同一个节点上,共享该节点上的一个GPU。
5. 测试Gang调度功能
5.1 创建Gang调度测试Pod
cat << EOF > gang-scheduling-test.yaml
apiVersion: v1
kind: Pod
metadata:name: gang-pod-1annotations:kai-scheduler.nvidia.com/gang-name: my-gangkai-scheduler.nvidia.com/gang-size: "2"
spec:schedulerName: kai-schedulercontainers:- name: gpu-containerimage: nvidia/cuda:12.1.1-runtime-ubuntu22.04command: ["sleep", "3600"]resources:limits:nvidia.com/gpu: 1
---
apiVersion: v1
kind: Pod
metadata:name: gang-pod-2annotations:kai-scheduler.nvidia.com/gang-name: my-gangkai-scheduler.nvidia.com/gang-size: "2"
spec:schedulerName: kai-schedulercontainers:- name: gpu-containerimage: nvidia/cuda:12.1.1-runtime-ubuntu22.04command: ["sleep", "3600"]resources:limits:nvidia.com/gpu: 1
EOF
5.2 部署Gang调度测试Pod
kubectl apply -f gang-scheduling-test.yaml
5.3 验证Gang调度
# 检查Pod状态
kubectl get pods -l=app=gang-test# 查看调度时间(应该相近)
kubectl describe pod gang-pod-1 | grep "StartTime"
kubectl describe pod gang-pod-2 | grep "StartTime"
两个Pod应该几乎同时被调度和启动。
6. 配置KAI Scheduler为默认调度器(可选)
6.1 备份当前调度器配置
kubectl get configmap -n kube-system kube-scheduler-config -o yaml > kube-scheduler-config-backup.yaml
6.2 创建新的调度器配置
cat << EOF > kube-scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-schedulerpluginConfig:- name: DefaultBinderargs:schedulerName: kai-scheduler
leaderElection:leaderElect: trueresourceLock: endpointsleases
EOF
6.3 应用新配置
kubectl apply -f kube-scheduler-config.yaml -n kube-system
6.4 重启kube-scheduler
根据你的Kubernetes部署方式,重启kube-scheduler。对于kubeadm部署:
# 找到kube-scheduler的静态Pod
sudo vi /etc/kubernetes/manifests/kube-scheduler.yaml# 简单修改一下文件(如添加一个空行)然后保存,kubelet会自动重启scheduler
6.5 验证默认调度器配置
# 创建一个不指定调度器的Pod
cat << EOF > test-default-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:name: test-default-scheduler
spec:containers:- name: test-containerimage: nginxcommand: ["sleep", "3600"]
EOFkubectl apply -f test-default-scheduler.yaml# 检查使用的调度器
kubectl describe pod test-default-scheduler | grep "Scheduler"
输出应该显示使用了kai-scheduler
。
7. 监控KAI Scheduler
7.1 部署Prometheus监控(如果尚未部署)
# 使用Helm安装Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \--namespace monitoring \--create-namespace \--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
7.2 创建ServiceMonitor监控KAI Scheduler
cat << EOF > kai-scheduler-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:name: kai-scheduler-monitornamespace: kai-schedulerlabels:app: kai-scheduler
spec:selector:matchLabels:app.kubernetes.io/name: kai-schedulerendpoints:- port: metricsinterval: 15s
EOFkubectl apply -f kai-scheduler-servicemonitor.yaml
7.3 访问Prometheus查看指标
# 端口转发
kubectl port-forward -n monitoring svc/prometheus-server 9090:80# 在浏览器访问 http://localhost:9090
# 搜索包含"kai_scheduler"的指标
8. 卸载KAI Scheduler
8.1 删除测试Pod
kubectl delete pod gpu-test-pod
kubectl delete pod gpu-share-pod-1
kubectl delete pod gpu-share-pod-2
kubectl delete pod gang-pod-1
kubectl delete pod gang-pod-2
kubectl delete pod test-default-scheduler
8.2 恢复默认调度器(如果之前修改过)
kubectl apply -f kube-scheduler-config-backup.yaml -n kube-system
# 重启kube-scheduler(如步骤6.4)
8.3 卸载KAI Scheduler
helm uninstall kai-scheduler -n kai-scheduler
kubectl delete namespace kai-scheduler
8.4 卸载ServiceMonitor(如果创建过)
kubectl delete servicemonitor -n kai-scheduler kai-scheduler-monitor
通过以上步骤,你可以在Kubernetes集群上完整部署、配置、测试和卸载NVIDIA KAI Scheduler。如果遇到任何问题,可以查看KAI Scheduler的日志或参考官方文档获取更多帮助。