当前位置: 首页 > wzjs >正文

制作网站怎样找公司来帮做wordpress设置网址

制作网站怎样找公司来帮做,wordpress设置网址,python网站开发高并发,红酒营销型网站建设《OpenShift / RHEL / DevSecOps 汇总目录》 说明:本文已经在 OpenShift 4.18 的环境中验证 文章目录 环境说明部署测试应用Time-slicingMPSMIG参考 环境说明 本文使用的 OpenShift 集群中有 2 个 Worker 节点,每个节点各有一块 NVIDIA L4,整…

《OpenShift / RHEL / DevSecOps 汇总目录》
说明:本文已经在 OpenShift 4.18 的环境中验证

文章目录

  • 环境说明
  • 部署测试应用
  • Time-slicing
  • MPS
  • MIG
  • 参考

环境说明

本文使用的 OpenShift 集群中有 2 个 Worker 节点,每个节点各有一块 NVIDIA L4,整个集群有 2 个 物理 GPU。
不过 NVIDIA L4 不支持 MIG,如有支持 MIG 的 GPU,可参见 NVIDIA 官方文档进行配置。

部署测试应用

  1. 查看 https://github.com/liuxiaoyu-git/gpu-partitioning-guide/blob/main/gpu-sharing-tests/base/nvidia-plugin-test-deployment.yaml,确认在 Deployment 的 pod 中将申请 1 个 GPU 运行名为 dcgmproftester12 的进程。
  2. 运行命令,部署 GPU 测试应用 nvidia-plugin-test。
git clone https://github.com/liuxiaoyu-git/gpu-partitioning-guide && cd gpu-partitioning-guide/
oc new-project demo
oc apply -k gpu-sharing-tests/overlays/default/

Time-slicing

  1. 将运行 nvidia-plugin-test 的 pod 数量增加到 2 个。因为集群有 2 个物理 GPU,因此 2 个 pod 可以正常运行。
$ oc scale deployment/nvidia-plugin-test --replicas=2 -n demo
$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-5ctlb   1/1     Running   0          10m
nvidia-plugin-test-bc49df6b8-d5hf7   1/1     Running   0          10m
  1. 将运行 nvidia-plugin-test 的 pod 数量增加到 4 个。因为集群只有 2 个物理 GPU,因此只有 2 个 pod 可以正常运行,另 2 个 pod 处于 Pending 状态。
$ oc scale deployment/nvidia-plugin-test --replicas=4 -n demo
$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-5ctlb   1/1     Running   0          10m
nvidia-plugin-test-bc49df6b8-d5hf7   1/1     Running   0          10m
nvidia-plugin-test-bc49df6b8-k8kbc   0/1     Pending   0          109s
nvidia-plugin-test-bc49df6b8-mqnwz   0/1     Pending   0          109s
  1. 通过查看 pod 的事件,可以看到由于 “Insufficient nvidia.com/gpu”,导致 FailedScheduling。
$ oc get event -n demo | grep pod/$(oc get pod --field-selector=status.phase=Pending -ojsonpath={.items[0].metadata.name})
5s          Warning   FailedScheduling           pod/nvidia-plugin-test-bc49df6b8-grjzj    0/5 nodes are available: 2 Insufficient nvidia.com/gpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 2 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.
  1. 为 GPU 启用 time-slicing 策略,将其一分为二。确认此时 ClusterPolicy 的配置已为 time-sliced。
$ oc apply -k gpu-sharing-instance/instance/overlays/time-sliced-2$ oc get ClusterPolicy gpu-cluster-policy -ojsonpath={.spec.devicePlugin} | jq
{"config": {"default": "time-sliced","name": "device-plugin-config"},"enabled": true
}$ oc get cm device-plugin-config -n nvidia-gpu-operator -oyaml
。。。
data:no-time-sliced: |-version: v1sharing:timeSlicing:resources:- name: nvidia.com/gpureplicas: 0time-sliced: |-version: v1sharing:timeSlicing:resources:- name: nvidia.com/gpureplicas: 2
。。。
  1. 确认此时 4 个 pod 都可以运行起来了。
$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-5ctlb   1/1     Running   0          10m
nvidia-plugin-test-bc49df6b8-d5hf7   1/1     Running   0          10m
nvidia-plugin-test-bc49df6b8-k8kbc   1/1     Running   0          109s
nvidia-plugin-test-bc49df6b8-mqnwz   1/1     Running   0          109s
  1. 观察 Worker 节点中和 GPU 相关的配置和状态。确认 nvidia.com/gpu.product 为 NVIDIA-L4-SHARED,nvidia.com/gpu.replicas 为 2,nvidia.com/gpu.sharing-strategy 为 time-slicing,且 GPU 的 Capacity、Allocatable 和 Allocated resources 都为 2。
$ oc describe node -l node-role.kubernetes.io/worker | egrep 'Name:|Capacity|nvidia.com/|Allocatable:|Allocated resources'
Name:               ip-10-0-36-231.us-east-2.compute.internalnvidia.com/cuda.driver-version.full=550.144.03nvidia.com/cuda.driver-version.major=550nvidia.com/cuda.driver-version.minor=144nvidia.com/cuda.driver-version.revision=03nvidia.com/cuda.driver.major=550nvidia.com/cuda.driver.minor=144nvidia.com/cuda.driver.rev=03nvidia.com/cuda.runtime-version.full=12.4nvidia.com/cuda.runtime-version.major=12nvidia.com/cuda.runtime-version.minor=4nvidia.com/cuda.runtime.major=12nvidia.com/cuda.runtime.minor=4nvidia.com/gfd.timestamp=1746009841nvidia.com/gpu-driver-upgrade-state=upgrade-donenvidia.com/gpu.compute.major=8nvidia.com/gpu.compute.minor=9nvidia.com/gpu.count=1nvidia.com/gpu.deploy.container-toolkit=truenvidia.com/gpu.deploy.dcgm=truenvidia.com/gpu.deploy.dcgm-exporter=truenvidia.com/gpu.deploy.device-plugin=truenvidia.com/gpu.deploy.driver=truenvidia.com/gpu.deploy.gpu-feature-discovery=truenvidia.com/gpu.deploy.node-status-exporter=truenvidia.com/gpu.deploy.nvsm=nvidia.com/gpu.deploy.operator-validator=truenvidia.com/gpu.family=amperenvidia.com/gpu.machine=g6.4xlargenvidia.com/gpu.memory=23034nvidia.com/gpu.mode=computenvidia.com/gpu.present=truenvidia.com/gpu.product=NVIDIA-L4-SHAREDnvidia.com/gpu.replicas=2nvidia.com/gpu.sharing-strategy=time-slicingnvidia.com/mig.capable=falsenvidia.com/mig.strategy=singlenvidia.com/mps.capable=falsenvidia.com/vgpu.present=falsenvidia.com/gpu-driver-upgrade-enabled: true
Capacity:nvidia.com/gpu:     2
Allocatable:nvidia.com/gpu:     2
Allocated resources:nvidia.com/gpu     2              2
Name:               ip-10-0-9-1.us-east-2.compute.internalnvidia.com/cuda.driver-version.full=550.144.03nvidia.com/cuda.driver-version.major=550nvidia.com/cuda.driver-version.minor=144nvidia.com/cuda.driver-version.revision=03nvidia.com/cuda.driver.major=550nvidia.com/cuda.driver.minor=144nvidia.com/cuda.driver.rev=03nvidia.com/cuda.runtime-version.full=12.4nvidia.com/cuda.runtime-version.major=12nvidia.com/cuda.runtime-version.minor=4nvidia.com/cuda.runtime.major=12nvidia.com/cuda.runtime.minor=4nvidia.com/gfd.timestamp=1746009846nvidia.com/gpu-driver-upgrade-state=upgrade-donenvidia.com/gpu.compute.major=8nvidia.com/gpu.compute.minor=9nvidia.com/gpu.count=1nvidia.com/gpu.deploy.container-toolkit=truenvidia.com/gpu.deploy.dcgm=truenvidia.com/gpu.deploy.dcgm-exporter=truenvidia.com/gpu.deploy.device-plugin=truenvidia.com/gpu.deploy.driver=truenvidia.com/gpu.deploy.gpu-feature-discovery=truenvidia.com/gpu.deploy.node-status-exporter=truenvidia.com/gpu.deploy.nvsm=nvidia.com/gpu.deploy.operator-validator=truenvidia.com/gpu.family=amperenvidia.com/gpu.machine=g6.4xlargenvidia.com/gpu.memory=23034nvidia.com/gpu.mode=computenvidia.com/gpu.present=truenvidia.com/gpu.product=NVIDIA-L4-SHAREDnvidia.com/gpu.replicas=2nvidia.com/gpu.sharing-strategy=time-slicingnvidia.com/mig.capable=falsenvidia.com/mig.strategy=singlenvidia.com/mps.capable=falsenvidia.com/vgpu.present=falsenvidia.com/gpu-driver-upgrade-enabled: true
Capacity:nvidia.com/gpu:     2
Allocatable:nvidia.com/gpu:     2
Allocated resources:nvidia.com/gpu     2              2
  1. 分别进入 2 个 Worker 节点运行 nvidia-driver 的 pod 内部查看 GPU 状态,确认都有 2 个运行 dcgmproftester12 的进程。
$ oc exec -n nvidia-gpu-operator $(oc get pod -n nvidia-gpu-operator -l app.kubernetes.io/component=nvidia-driver -o jsonpath="{.items[0].metadata.name}") -- nvidia-smi
Wed Apr 30 10:47:33 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:35:00.0 Off |                    0 |
| N/A   79C    P0             71W /   72W |     591MiB /  23034MiB |     99%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     39955      C   /usr/bin/dcgmproftester12                     582MiB |
|    0   N/A  N/A     40746      C   /usr/bin/dcgmproftester12                     582MiB |
++-----------------------------------------------------------------------------------------+$ oc exec -n nvidia-gpu-operator $(oc get pod -n nvidia-gpu-operator -l app.kubernetes.io/component=nvidia-driver -o jsonpath="{.items[1].metadata.name}") -- nvidia-smi
Wed Apr 30 10:47:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:35:00.0 Off |                    0 |
| N/A   81C    P0             71W /   72W |     591MiB /  23034MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     73271      C   /usr/bin/dcgmproftester12                     582MiB |
|    0   N/A  N/A     74911      C   /usr/bin/dcgmproftester12                     582MiB |
+-----------------------------------------------------------------------------------------+
  1. 将运行 nvidia-plugin-test 的 pod 数增加到 6 个,确认此时有 2 个 pod 因为获得不到 GPU 资源而处于 Pending 状态。
$ oc scale deployment/nvidia-plugin-test --replicas=6 -n demo$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-2gm4j   0/1     Pending   0          4m31s
nvidia-plugin-test-bc49df6b8-7gv2w   0/1     Pending   0          4m31s
nvidia-plugin-test-bc49df6b8-5ctlb   1/1     Running   0          24m
nvidia-plugin-test-bc49df6b8-d5hf7   1/1     Running   0          24m
nvidia-plugin-test-bc49df6b8-k8kbc   1/1     Running   0          15m
nvidia-plugin-test-bc49df6b8-mqnwz   1/1     Running   0          15m
  1. 修改 GPU 的 time-slicing 策略,将其一分为四。
$ oc apply -k gpu-sharing-instance/instance/overlays/time-sliced-4
  1. 在重置 ClusterPolicy 的 /spec/devicePlugin/enabled 的状态后,确认 6 个 pod 都为 Running 状态了。
$ oc patch ClusterPolicy gpu-cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/devicePlugin/enabled", "value": false}]'
$ oc patch ClusterPolicy gpu-cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/devicePlugin/enabled", "value": true}]'$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-2gm4j   1/1     Running   0          6m21s
nvidia-plugin-test-bc49df6b8-7gv2w   1/1     Running   0          6m21s
nvidia-plugin-test-bc49df6b8-5ctlb   1/1     Running   0          26m
nvidia-plugin-test-bc49df6b8-d5hf7   1/1     Running   0          26m
nvidia-plugin-test-bc49df6b8-k8kbc   1/1     Running   0          17m
nvidia-plugin-test-bc49df6b8-mqnwz   1/1     Running   0          17m

MPS

  1. 将运行 nvidia-plugin-test 的 pod 数量减到 2 个。
$ oc scale deployment/nvidia-plugin-test --replicas=2 -n demo
$ oc get pod -n demo
NAME                                 READY   STATUS        RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-sl2kh   1/1     Running       0          30m
nvidia-plugin-test-bc49df6b8-zppf8   1/1     Running       0          37m
  1. 为 GPU 启用 mps 策略,将其一分为二。注意:因为 mps 需要使用 daemonset 作为控制器,因此可观察在启用 mps 后 daemonset 在两个 Worker 节点的运行状况。
$ oc get daemonset nvidia-device-plugin-mps-control-daemon -n nvidia-gpu-operator
NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                                                                         AGE
nvidia-device-plugin-mps-control-daemon         0         0         0       0            0           nvidia.com/gpu.deploy.device-plugin=true,nvidia.com/mps.capable=true                                                  73m$ oc apply -k gpu-sharing-instance/instance/overlays/mps-2
configmap/device-plugin-config configured
configmap/nvidia-dcgm-exporter-dashboard-c7bf99fb7g unchanged
clusterpolicy.nvidia.com/gpu-cluster-policy configured$ oc get daemonset nvidia-device-plugin-mps-control-daemon -n nvidia-gpu-operator
NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                                                                         AGE
nvidia-device-plugin-mps-control-daemon         2         2         2       2            2           nvidia.com/gpu.deploy.device-plugin=true,nvidia.com/mps.capable=true                                                  7h54m
  1. 确认此时 ClusterPolicy 的配置已为 mps-sliced。
$ oc get ClusterPolicy gpu-cluster-policy -ojsonpath={.spec.devicePlugin} | jq
{"config": {"default": "mps-sliced","name": "device-plugin-config"},"enabled": true
}$ oc get cm device-plugin-config -n nvidia-gpu-operator -oyaml
...
data:mps-sliced: |-version: v1sharing:mps:resources:- name: nvidia.com/gpureplicas: 2no-mps-sliced: |-version: v1sharing:mps:resources:- name: nvidia.com/gpureplicas: 0
...
  1. 观察 Worker 节点中和 GPU 相关的配置和状态。确认 nvidia.com/gpu.product 为 NVIDIA-L4-SHARED,nvidia.com/gpu.replicas 为 2,nvidia.com/gpu.sharing-strategy 为 mps,且 GPU 的 Capacity、Allocatable 和 Allocated resources 都为 2。
$ oc describe node -l node-role.kubernetes.io/worker | egrep 'Name:|Capacity|nvidia.com/|Allocatable:|Allocated resources'
Name:               ip-10-0-36-231.us-east-2.compute.internalnvidia.com/cuda.driver-version.full=550.144.03nvidia.com/cuda.driver-version.major=550nvidia.com/cuda.driver-version.minor=144nvidia.com/cuda.driver-version.revision=03nvidia.com/cuda.driver.major=550nvidia.com/cuda.driver.minor=144nvidia.com/cuda.driver.rev=03nvidia.com/cuda.runtime-version.full=12.4nvidia.com/cuda.runtime-version.major=12nvidia.com/cuda.runtime-version.minor=4nvidia.com/cuda.runtime.major=12nvidia.com/cuda.runtime.minor=4nvidia.com/gfd.timestamp=1746005265nvidia.com/gpu-driver-upgrade-state=upgrade-donenvidia.com/gpu.compute.major=8nvidia.com/gpu.compute.minor=9nvidia.com/gpu.count=1nvidia.com/gpu.deploy.container-toolkit=truenvidia.com/gpu.deploy.dcgm=truenvidia.com/gpu.deploy.dcgm-exporter=truenvidia.com/gpu.deploy.device-plugin=truenvidia.com/gpu.deploy.driver=truenvidia.com/gpu.deploy.gpu-feature-discovery=truenvidia.com/gpu.deploy.node-status-exporter=truenvidia.com/gpu.deploy.nvsm=nvidia.com/gpu.deploy.operator-validator=truenvidia.com/gpu.family=amperenvidia.com/gpu.machine=g6.4xlargenvidia.com/gpu.memory=23034nvidia.com/gpu.mode=computenvidia.com/gpu.present=truenvidia.com/gpu.product=NVIDIA-L4-SHAREDnvidia.com/gpu.replicas=2nvidia.com/gpu.sharing-strategy=mpsnvidia.com/mig.capable=falsenvidia.com/mig.strategy=singlenvidia.com/mps.capable=truenvidia.com/vgpu.present=falsenvidia.com/gpu-driver-upgrade-enabled: true
Capacity:nvidia.com/gpu:     2
Allocatable:nvidia.com/gpu:     2
Allocated resources:nvidia.com/gpu     1              1
Name:               ip-10-0-9-1.us-east-2.compute.internalnvidia.com/cuda.driver-version.full=550.144.03nvidia.com/cuda.driver-version.major=550nvidia.com/cuda.driver-version.minor=144nvidia.com/cuda.driver-version.revision=03nvidia.com/cuda.driver.major=550nvidia.com/cuda.driver.minor=144nvidia.com/cuda.driver.rev=03nvidia.com/cuda.runtime-version.full=12.4nvidia.com/cuda.runtime-version.major=12nvidia.com/cuda.runtime-version.minor=4nvidia.com/cuda.runtime.major=12nvidia.com/cuda.runtime.minor=4nvidia.com/gfd.timestamp=1746005260nvidia.com/gpu-driver-upgrade-state=upgrade-donenvidia.com/gpu.compute.major=8nvidia.com/gpu.compute.minor=9nvidia.com/gpu.count=1nvidia.com/gpu.deploy.container-toolkit=truenvidia.com/gpu.deploy.dcgm=truenvidia.com/gpu.deploy.dcgm-exporter=truenvidia.com/gpu.deploy.device-plugin=truenvidia.com/gpu.deploy.driver=truenvidia.com/gpu.deploy.gpu-feature-discovery=truenvidia.com/gpu.deploy.node-status-exporter=truenvidia.com/gpu.deploy.nvsm=nvidia.com/gpu.deploy.operator-validator=truenvidia.com/gpu.family=amperenvidia.com/gpu.machine=g6.4xlargenvidia.com/gpu.memory=23034nvidia.com/gpu.mode=computenvidia.com/gpu.present=truenvidia.com/gpu.product=NVIDIA-L4-SHAREDnvidia.com/gpu.replicas=2nvidia.com/gpu.sharing-strategy=mpsnvidia.com/mig.capable=falsenvidia.com/mig.strategy=singlenvidia.com/mps.capable=truenvidia.com/vgpu.present=falsenvidia.com/gpu-driver-upgrade-enabled: true
Capacity:nvidia.com/gpu:     2
Allocatable:nvidia.com/gpu:     2
Allocated resources:nvidia.com/gpu     1              1
  1. 将运行 nvidia-plugin-test 的 pod 数增加到 4 个,确认都可以正常运行。
$ oc scale deployment/nvidia-plugin-test --replicas=4 -n demo
$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-7876r   1/1     Running   0          6s
nvidia-plugin-test-bc49df6b8-sl2kh   1/1     Running   0          34m
nvidia-plugin-test-bc49df6b8-w2z9g   1/1     Running   0          6s
nvidia-plugin-test-bc49df6b8-zppf8   1/1     Running   0          41m
  1. 进入每个 Worker 节点运行 nvidia-driver 的 pod 内部查看 GPU 状态,确认除了有运行 dcgmproftester12 的进程,还有运行 nvidia-cuda-mps-server 的进程。
$ oc exec -n nvidia-gpu-operator $(oc get pod -n nvidia-gpu-operator -l app.kubernetes.io/component=nvidia-driver -o jsonpath="{.items[0].metadata.name}") -- nvidia-smi
Wed Apr 30 10:08:12 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:35:00.0 Off |                    0 |
| N/A   80C    P0             72W /   72W |    1033MiB /  23034MiB |    100%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     85231      C   nvidia-cuda-mps-server                         28MiB |
|    0   N/A  N/A     99955    M+C   /usr/bin/dcgmproftester12                     498MiB |
|    0   N/A  N/A     99958    M+C   /usr/bin/dcgmproftester12                     498MiB |
+-----------------------------------------------------------------------------------------+
  1. 将运行 nvidia-plugin-test 的 pod 数增加到 6 个,确认有 2 个 pod 为 Pending 状态。
$ oc scale deployment/nvidia-plugin-test --replicas=6 -n demo
$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-7876r   1/1     Running   0          6s
nvidia-plugin-test-bc49df6b8-sl2kh   1/1     Running   0          34m
nvidia-plugin-test-bc49df6b8-w2z9g   1/1     Running   0          6s
nvidia-plugin-test-bc49df6b8-zppf8   1/1     Running   0          41m
nvidia-plugin-test-bc49df6b8-h6jj5   1/1     Pending   0          1m55s
nvidia-plugin-test-bc49df6b8-plm5n   1/1     Pending   0          1m55s
  1. 通过查看 pod 的事件,可以看到由于 “Insufficient nvidia.com/gpu”,导致 FailedScheduling。
$ oc get event -n demo | grep pod/$(oc get pod --field-selector=status.phase=Pending -ojsonpath={.items[0].metadata.name})
3m24s       Warning   FailedScheduling    pod/nvidia-plugin-test-bc49df6b8-cps8h     0/5 nodes are available: 2 Insufficient nvidia.com/gpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 2 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.
  1. 修改 GPU 的 mps 策略,将其一分为四。
$ oc apply -k gpu-sharing-instance/instance/overlays/mps-4
  1. 在重置 ClusterPolicy 的 /spec/devicePlugin/enabled 的状态后,确认 6 个 pod 都为 Running 状态了。
$ oc patch ClusterPolicy gpu-cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/devicePlugin/enabled", "value": false}]'
$ oc patch ClusterPolicy gpu-cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/devicePlugin/enabled", "value": true}]'$ oc get pod -n demo
NAME                                 READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-bc49df6b8-7876r   1/1     Running   0          7m26s
nvidia-plugin-test-bc49df6b8-h6jj5   1/1     Running   0          4m55s
nvidia-plugin-test-bc49df6b8-plm5n   1/1     Running   0          4m55s
nvidia-plugin-test-bc49df6b8-sl2kh   1/1     Running   0          42m
nvidia-plugin-test-bc49df6b8-w2z9g   1/1     Running   0          7m26s
nvidia-plugin-test-bc49df6b8-zppf8   1/1     Running   0          49m

MIG

请参见:
《MIG Support in OpenShift Container Platform 》
《The benefits of dynamic GPU slicing in OpenShift》

参考

https://github.com/liuxiaoyu-git/gpu-partitioning-guide
https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/time-slicing-gpus-in-openshift.html


文章转载自:

http://cw71iReL.bmkqq.cn
http://2LjRnsxv.bmkqq.cn
http://yd7mEG6i.bmkqq.cn
http://SHNTELKu.bmkqq.cn
http://ARJh1K8v.bmkqq.cn
http://ztwlBSAn.bmkqq.cn
http://uvZv42bI.bmkqq.cn
http://cfffn0Sr.bmkqq.cn
http://sOqz77O6.bmkqq.cn
http://ECZ7GyL4.bmkqq.cn
http://JZV3DSnf.bmkqq.cn
http://BpjdqVi3.bmkqq.cn
http://AYazlmfw.bmkqq.cn
http://PJW8OJXx.bmkqq.cn
http://Ps9ATmtZ.bmkqq.cn
http://QmATeUQ1.bmkqq.cn
http://1RR2Mfam.bmkqq.cn
http://I8ATWs44.bmkqq.cn
http://6C4Wtaov.bmkqq.cn
http://dIByOhoJ.bmkqq.cn
http://U2eBqwGs.bmkqq.cn
http://UjNfiyrZ.bmkqq.cn
http://aBNoUutq.bmkqq.cn
http://8wv5gdSf.bmkqq.cn
http://9oixJtSj.bmkqq.cn
http://b9YOxU56.bmkqq.cn
http://kt9vHf0O.bmkqq.cn
http://F5exVd6o.bmkqq.cn
http://mJoksNk5.bmkqq.cn
http://h9KLTG7H.bmkqq.cn
http://www.dtcms.com/wzjs/697373.html

相关文章:

  • 湛江网站网站建设今天重庆重大新闻
  • 青岛网站排名优化公司哪家好网站做外链多少钱
  • 张家港哪家做企业网站app开发和网站开发
  • 东莞如何建设网站制作平台沧州句号网络科技有限公司
  • 做影集的网站或软件下载电话网络营销是什么
  • 美食网站网页设计论文资源库网站建设
  • 腾讯网站建设费用如何建立论坛网站
  • 安康市网站开发2个女人做暧暧网站
  • 如何使用模板建设网站阿里云里做网站能上百度首页么
  • 比较有设计感的网站野望是什么意思
  • 网站建设项目实践报告网站开发实施计划
  • 奉贤网站建设上海站霸网站建设设计服务公司
  • 常用的网站开发设计语言现在的网站推广是怎么做的
  • 外贸建站优化推广跨境电商erp软件排名
  • 住房和城乡建设部网站买卖合同wordpress登录开发文档
  • 怎么查网站到期时间查询手机网站设计欣赏
  • 如何从零开始做网站wordpress如何做首页
  • 南昌哪个公司做网站好做网站代理好吗
  • 网站如何做ssl认证南京企业网站设计公司500元
  • 有没有免费的推广网站给公司做兼职维护网站多少钱
  • 花都网站建设公司天蝎信息广州开发网站技术
  • wordpress外贸网站建设免费网站建设代理
  • 网站建设赶集网php做商城网站怎么做好
  • 做网站要多钱wordpress中文帮助
  • 建立网站需要多少钱责任y湖南岚鸿联系宜城网站定制
  • 网页做二维码哪个网站好上海 网站建设google
  • 河北沧州泊头做网站的电话深圳市中医院
  • 杭州英文网站建设建设企业网站登录901
  • 账户竞价托管哪里好网站优化与seo的区别
  • 网站的在线支付怎么做泸州城建设档案管网站