当前位置: 首页 > news >正文

GPUStack昇腾Atlas300I duo部署模型DeepSeek-R1【GPUStack实战篇2】

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

2025年4月25日GPUStack发布了v0.6版本,为昇腾芯片910B(1-4)和310P3内置了MinIE推理,新增了310P芯片的支持,很感兴趣,所以我马上来捣鼓玩玩看哈
官方文档:https://docs.gpustack.ai/latest/installation/ascend-cann/online-installation/
目前GPUStack的Ascend MindIE推理引擎支持的模型列表:https://www.hiascend.com/document/detail/zh/mindie/100/whatismindie/mindie_what_0003.html


部署GPUStack

可以参考我之前写的:鲲鹏+昇腾部署集群管理软件GPUStack,两台服务器搭建双节点集群【实战详细踩坑篇】

启动并创建容器:

docker run -d --name gpustack \--restart=unless-stopped \--device /dev/davinci0 \--device /dev/davinci1 \--device /dev/davinci_manager \--device /dev/devmm_svm \--device /dev/hisi_hdc \-v /usr/local/dcmi:/usr/local/dcmi \-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \-v /etc/ascend_install.info:/etc/ascend_install.info \--network=host \--ipc=host \-v gpustack-data:/var/lib/gpustack \gpustack/gpustack:latest-npu-310p

部署DeepSeek-R1

在这里插入图片描述
(1)登录后选择模型,搜索:deepseek,选择deepseek-ai/DeepSeek-R1-Distill-Qwen-7B模型,后端选择:Ascend MindIE,然后点保存。

在这里插入图片描述

下载完成后运行报错,查了一下,目前适配的Ascend MindIE是1.0.0版本,还没适配DeepSeek-R1!

在这里插入图片描述

以下为报错日志:

2025-04-27 06:42:05,038 [ERROR] model.py:39 - [Model]	>>> Exception:call aclnnInplaceZero failed, detail:EZ9999: Inner Error!
EZ9999: [PID: 43453] 2025-04-27-06:42:05.032.406 Parse dynamic kernel config fail.TraceBack (most recent call last):AclOpKernelInit failed opTypeZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed.[ERROR] 2025-04-27-06:42:05 (PID:43453, Device:0, RankID:-1) ERR01100 OPS call acl api failed
Traceback (most recent call last):File "/usr/local/lib/python3.11/dist-packages/model_wrapper/model.py", line 37, in initializereturn self.python_model.initialize(config)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/model_wrapper/standard_model.py", line 146, in initializeself.generator = Generator(^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/generator.py", line 119, in __init__self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode)File "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/generator.py", line 303, in warm_upraise eFile "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/generator.py", line 296, in warm_upself._generate_inputs_warm_up_backend(input_metadata, inference_mode, dummy=True)File "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/generator.py", line 378, in _generate_inputs_warm_up_backendself.generator_backend.warm_up(model_inputs, inference_mode=inference_mode)File "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 198, in warm_upsuper().warm_up(model_inputs)File "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 170, in warm_up_ = self.forward(model_inputs, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapperreturn func(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 153, in forwardlogits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 89, in forwardlogits = self.forward_tensor(^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 116, in forward_tensorlogits = self.model_runner.forward(^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 193, in forwardreturn self.model.forward(**kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 452, in forwardself.init_ascend_weight()File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 150, in init_ascend_weightweight_wrapper = self.get_weights()^^^^^^^^^^^^^^^^^^File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 132, in get_weightsweight_wrapper = WeightWrapper(self.soc_info, self.tp_rank, attn_wrapper, mlp_wrapper)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/Ascend/atb-models/atb_llm/utils/data/weight_wrapper.py", line 49, in __init__self.placeholder = torch.zeros(1, dtype=torch.float16, device="npu")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: call aclnnInplaceZero failed, detail:EZ9999: Inner Error!
EZ9999: [PID: 43453] 2025-04-27-06:42:05.032.406 Parse dynamic kernel config fail.TraceBack (most recent call last):AclOpKernelInit failed opTypeZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed.[ERROR] 2025-04-27-06:42:05 (PID:43453, Device:0, RankID:-1) ERR01100 OPS call acl api failed
2025-04-27 06:42:05,042 [ERROR] model.py:42 - [Model]	>>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}
[2025-04-27 06:42:05.146668+00:00] [43225] [43226] [server] [WARN] [llm_daemon.cpp:64] : [Daemon] received exit signal[17]
[2025-04-27 06:42:05.146771+00:00] [43225] [43226] [server] [INFO] [llm_daemon.cpp:69] : Daemon wait pid with 43453, status 9
[2025-04-27 06:42:05.146776+00:00] [43225] [43226] [server] [ERROR] [llm_daemon.cpp:74] : ERR: Daemon wait pid with 43453 exit, Please check the service log or python log.
[ERROR] TBE(43892,python3):2025-04-27-06:42:05.253.179 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43893,python3):2025-04-27-06:42:05.253.179 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43891,python3):2025-04-27-06:42:05.253.179 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43890,python3):2025-04-27-06:42:05.253.179 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43888,python3):2025-04-27-06:42:05.253.207 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43887,python3):2025-04-27-06:42:05.253.222 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43889,python3):2025-04-27-06:42:05.253.262 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE(43886,python3):2025-04-27-06:42:05.253.290 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:65][repository_manager] Subprocess[task_distribute] raise error[]
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdownwarnings.warn('resource_tracker: There appear to be %d '
Daemon is killing...
[2025-04-27 06:42:10.147021][43225][localhost.localdomain][system][stop][endpoint][success]
[2025-04-27 06:42:10.147044][43225][localhost.localdomain][system][stop][mindie server][success]

部署Qwen2.5模型

目前看qwen2.5系列是支持的,所以尝试一下
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
也不行,我是真服了,报错日志和上面一模一样


发现问题了,原来也是要改模型权重的精度,不支持BF16,需要改成FP16
在这里插入图片描述
运行成功了!

在这里插入图片描述

测试对话

在这里插入图片描述
低参数基本都会有这个问题,我改回测高参数的模型,测试Qwen2.5-7B-Instruct正常

在这里插入图片描述

相关文章:

  • Java安全之cc链学习集合
  • 【MySQL 】MySQL 安装自记录全程-详细 (mysql-installer-community-8.0.42.0.msi)
  • XLSX.utils.sheet_to_json设置了blankrows:true,但无法获取到开头的空白行
  • 毫米波振荡器设计知识笔记
  • 快速排序及其在Unity游戏开发中的应用
  • 在旧版本中打开Anylogic模型
  • 纯净无噪,智见未来——MAGI-1本地部署教程,自回归重塑数据本质
  • GAMES202-高质量实时渲染(homework1)
  • Web前端开发:CSS Float(浮动)与 Positioning(定位)
  • Pydantic :基于 Python 类型注解(type hints)的数据验证和数据解析库
  • 《电商业务分析终极框架:从数据到决策的标准化路径》
  • cuda学习2:cuda编程基本概念
  • LeetCode12_整数转罗马数字
  • 人机鉴权和机机鉴权
  • 【算法应用】基于灰狼算法求解DV-Hop定位问题
  • 面试:结构体默认是对齐的嘛?如何禁止对齐?
  • 【每日随笔】文化属性 ① ( 天机 | 强势文化与弱势文化 | 文化属性的形成与改变 | 强势文化 具备的特点 )
  • 利用脚本搭建私有云平台,部署云平台,发布云主机并实现互连和远程连接
  • AI发展史
  • MySQL索引优化与实战 - Java架构师面试解析
  • 《探秘海昏侯国》数字沉浸特展亮相首届江西文化旅游产业博览交易会
  • 湖北鄂城:相继4所小学有学生腹泻呕吐,供餐企业负责人已被采取强制措施
  • 强制性国家标准《危险化学品企业安全生产标准化通用规范》发布
  • “乐购浦东”消费券明起发放,多个商家同期推出折扣促销活动
  • 中方发布《不跪!》视频传递何种信息?外交部回应
  • 法治日报调查直播间“杀熟”乱象:熟客越买越贵,举证难维权不易