解决ktransformers v0.3 docker镜像中 operator torchvision::nms does not exist 问题
问题背景
更新ktransformers docker镜像到v0.3版本后(之前为v0.2.4post1),使用更新前启动命令无法正确启动服务,提示以下错误:
Traceback (most recent call last):File "/workspace/ktransformers/ktransformers/server/main.py", line 12, in <module>from ktransformers.server.utils.create_interface import create_interface, GlobalInterfaceFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/utils/create_interface.py", line 14, in <module>from ktransformers.server.backend.context_manager import ThreadContextManagerFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/backend/context_manager.py", line 8, in <module>from ktransformers.server.backend.interfaces.transformers import TransformersThreadContextFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 5, in <module>from transformers import (File "<frozen importlib._bootstrap>", line 1229, in _handle_fromlistFile "/opt/conda/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1956, in __getattr__value = getattr(module, name)^^^^^^^^^^^^^^^^^^^^^File "/opt/conda/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__module = self._get_module(self._class_to_module[name])^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/opt/conda/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1969, in _get_moduleraise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
operator torchvision::nms does not exist
原因分析
经搜索得知,该异常由torchvision版本与torch版本不匹配导致。
解决方案
卸载torchvision,重新安装 匹配版本 。
但注意,此处不能直接安装最新版本,如 pip install --upgrade torchvision
,这会将torchvision和torch均更新至最新版本,然后由于预编译的ktransformer与torch版本不一致导致以下错误:
Traceback (most recent call last):File "/workspace/ktransformers/ktransformers/server/main.py", line 10, in <module>from ktransformers.server.args import ArgumentParserFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/args.py", line 3, in <module>from ktransformers.util.utils import get_free_portsFile "/opt/conda/lib/python3.11/site-packages/ktransformers/util/utils.py", line 14, in <module>from ktransformers.util.custom_gguf import translate_name_to_ggufFile "/opt/conda/lib/python3.11/site-packages/ktransformers/util/custom_gguf.py", line 27, in <module>import KTransformersOps
ImportError: /opt/conda/lib/python3.11/site-packages/KTransformersOps.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs
正确做法是卸载torchvision后安装与torch版本对应的torchvision,以下为安装命令(对应torch 2.6.0):
pip install torchvision==0.21.0
正确安装后问题现象消失。