当前位置：首页 > news >正文

OpenShift AI - 使用 NVIDIA Triton Runtime 运行模型

news 2025/7/5 10:11:01

《OpenShift / RHEL / DevSecOps 汇总目录》
说明：本文已经在 OpenShift 4.18 + OpenShift AI 2.19 的环境中验证

文章目录

准备 Triton Runtime 环境
- 添加 Triton Serving Runtime
- 运行基于 Triton Runtime 的 Model Server
在 Triton Runtime 中运行模型
- 准备模型运行环境
- 运行 PyTorch 模型
- 运行 ONNX 模型
- 运行 TensorFlow 模型
参考

准备 Triton Runtime 环境

添加 Triton Serving Runtime

进入 RHOAI 的 Settings -> Serving runtime 菜单。
点击 Add serving runtime 按钮。
在 Add serving runtime 页面中选择 Multi-model serving platform 和 REST。
在 YAML 区域点击 ‘Start from scratch’，然后提供以下内容，最后 Create。

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:name: triton-23.05labels:name: triton-23.05annotations:maxLoadingConcurrency: "2"openshift.io/display-name: Triton runtime - 25.05-py3
spec:supportedModelFormats:- name: kerasversion: "2" # 2.6.0autoSelect: true- name: onnxversion: "1" # 1.5.3autoSelect: true- name: pytorchversion: "1" # 1.8.0a0+17f8c32autoSelect: true- name: tensorflowversion: "1" # 1.15.4autoSelect: true- name: tensorflowversion: "2" # 2.3.1autoSelect: true- name: tensorrtversion: "7" # 7.2.1autoSelect: true- name: sklearnversion: "0" # v0.23.1autoSelect: false- name: xgboostversion: "1" # v1.1.1autoSelect: false- name: lightgbmversion: "3" # v3.2.1autoSelect: falseprotocolVersions:- grpc-v2multiModel: truegrpcEndpoint: port:8085grpcDataEndpoint: port:8001volumes:- name: shmemptyDir:medium: MemorysizeLimit: 2Gicontainers:- name: tritonimage: nvcr.io/nvidia/tritonserver:25.05-py3command:- /bin/shargs:- -c- 'mkdir -p /models/_triton_models;chmod 777 /models/_triton_models;exec tritonserver"--model-repository=/models/_triton_models""--model-control-mode=explicit""--strict-model-config=false""--strict-readiness=false""--allow-http=true""--allow-sagemaker=false"'volumeMounts:- name: shmmountPath: /dev/shmresources:requests:cpu: 500mmemory: 1Gilimits:cpu: "5"memory: 1GilivenessProbe:exec:command:- curl- --fail- --silent- --show-error- --max-time- "9"- http://localhost:8000/v2/health/liveinitialDelaySeconds: 5periodSeconds: 30timeoutSeconds: 10builtInAdapter:serverType: tritonruntimeManagementPort: 8001memBufferBytes: 134217728modelLoadingTimeoutMillis: 90000

运行基于 Triton Runtime 的 Model Server

在一个 RHOAI 项目中为 Models 设为 Multi-model serving platform 类型。
按下图在 Models 中运行一个基于 Triton 运行时的 Model Server。
完成后可以查看 Triton Model Server 的运行情况。

$ oc get deploy
NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
modelmesh-serving-triton-model-server   1/1     1            1           24h

在这里插入图片描述

在 Triton Runtime 中运行模型

准备模型运行环境

在 RHOAI 中创建一个项目，然后在 Models 中选择 ‘Select multi-model’。
确保在对象存储中有名为 ai-models 的存储桶。
创建一个名为 ai-models 的 Connection，连到对象存储中名为 ai-models 的存储桶。

运行 PyTorch 模型

将 modelmesh-minio-examples 下载到本地，然后查看位于 modelmesh-minio-examples/pytorch/cifar 目录下所包含的文件。

$ git clone https://github.com/kserve/modelmesh-minio-examples && cd modelmesh-minio-examples/pytorch
$ tree cifar/
cifar/
├── 1
│   └── model.pt
└── config.pbtxt

将 modelmesh-minio-examples/pytorch/cifar 目录上传到对象存储中的 ai-models 存储桶中。
在 RHOAI 中的 Models 页面里点击 Triton Model Server 一行右侧的 Deploy model 按钮，然后按下图部署位于对象存储中的 cifar 模型。
完成后可以看到 cifar-triton-torch 模型的部署状态。
查询模型的 input 和 output 格式。

$ MODEL_NAME=cifar-triton-torch
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{"name": "cifar-triton-torch__isvc-9f77f26bf2","versions": ["1"],"platform": "pytorch_libtorch","inputs": [{"name": "INPUT__0","datatype": "FP32","shape": ["-1","3","32","32"]}],"outputs": [{"name": "OUTPUT__0","datatype": "FP32","shape": ["-1","10"]}]
}

下载测试数据文件，然后提交给 cifar-triton-torch 模型，得到返回结果。

$ wget https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/triton/torchscript/input.json
$ curl -s -X POST -k "${MODEL_URL}/infer" -H "Content-Type: application/json" -d @./input.json | jq
{"model_name": "cifar-triton-torch__isvc-9f77f26bf2","model_version": "1","outputs": [{"name": "OUTPUT__0","datatype": "FP32","shape": [1,10],"data": [-0.55252016,-1.7675304,0.6265609,1.4070208,0.38794953,1.3849527,-0.16314837,0.85409915,-0.6349715,-0.6840154]}]
}

运行 ONNX 模型

下载 https://ai-on-openshift.io/odh-rhoai/img-triton/card.fraud.detection.onnx 模型文件到本地。
将模型文件上传到对象存储的 ai-models 存储桶下的 card-fraud-detection 文件夹中。
按下图将 card-fraud-detection.onnx 部署模型到 Triton Model Server 中。
查看部署状态。
查询模型的 input 和 output 格式。

$ MODEL_NAME=card-fraud-detection
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{"name": "card-fraud-detection-1__isvc-c0a9fa30b8","versions": ["1"],"platform": "onnxruntime_onnx","inputs": [{"name": "dense_input","datatype": "FP32","shape": ["-1","7"]}],"outputs": [{"name": "dense_3","datatype": "FP32","shape": ["-1","1"]}]
}

访问 card-fraud-detection 模型。

$ curl -s -X POST -k "${MODEL_URL}/infer" -d '{"inputs": [{ "name": "dense_input", "shape": [1, 7], "datatype": "FP32", "data": [57.87785658389723,0.3111400080477545,1.9459399775518593,1.0,1.0,0.0,0.0]}]}' | jq
{"model_name": "card-fraud-detection__isvc-7bda50d09c","model_version": "1","outputs": [{"name": "dense_3","datatype": "FP32","shape": [1,1],"data": [0.86280495]}]
}

运行 TensorFlow 模型

将 modelmesh-minio-examples 下载到本地，然后查看位于 modelmesh-minio-examples/tensorflow 目录下所包含的文件。

$ git clone https://github.com/kserve/modelmesh-minio-examples && cd modelmesh-minio-examples
$ tree tensorflow
tensorflow
+--- mnist
|   +--- saved_model.pb
|   +--- variables
|   |   +--- variables.data-00000-of-00001
|   |   +--- variables.index
+--- simple_string
|   +--- 1
|   |   +--- model.graphdef
|   +--- config.pbtxt

将 tensorflow 目录上传到对象存储的 ai-models 存储桶中。
在 RHOAI 中按下图部署模型，分别使用 tensorflow/mnist 和 tensorflow/simple_string 作为部署模型的 Path。
完成后可以看到部署好的 mnist 和 simple_string 模型。

参考

https://github.com/kserve/modelmesh-serving/blob/main/config/runtimes/triton-2.x.yaml
https://github.com/kserve/modelmesh-minio-examples/tree/main/pytorch/cifar

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tutorials/Quick_Deploy/PyTorch/README.html
https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1150/user-guide/docs/model_repository.html#pytorch-models
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags

https://github.com/triton-inference-server/tutorials/tree/main/Conceptual_Guide/Part_1-model_deployment
https://github.com/triton-inference-server/server/tree/main/docs/examples

https://ai-on-openshift.io/odh-rhoai/custom-runtime-triton
https://ai-on-openshift.io/tools-and-applications/ensemble-serving/ensemble-serving/
https://ai-on-openshift.io/odh-rhoai/custom-runtime-triton/#deploying-a-model-into-it
https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing
https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing/blob/main/runtime/runtime-rest.yaml

https://kserve.github.io/website/latest/modelserving/v1beta1/triton/torchscript/

查看全文

http://www.dtcms.com/a/266629.html