当前位置: 首页 > news >正文

mindie1.0新特性及调试问题总结

说明

最近在ascend 310P3上使用mindie 1.0部署模型,跟我以前使用的mindie 1.0_rc2比,有很多新的特性和变化,导致部署出现了不少问题。这里罗列下我的发现,希望对其他人有用。

特性1:需要显式配置share_memory

报错信息:

[2025-04-02 14:04:30.801+08:00] [63217] [63219] [server] [INFO] [share_memory.cpp:168] : [ShareMemory::SharedMemorySizeCheck] shared memory size check success.
[2025-04-02 14:04:30.826+08:00] [63217] [63219] [server] [ERROR] [share_memory.cpp:158] : [ShareMemory::SharedMemorySizeCheck]shared memory available is not enough on the filesystem.
[2025-04-02 14:04:30.826+08:00] [63217] [63219] [server] [ERROR] [share_memory.cpp:26] : [ShareMemory::Create]available shared memory size is not enough.
[2025-04-02 14:04:30.826+08:00] [63217] [63219] [server] [ERROR] [master_IPC_communicator.cpp:137] : Failed to create response shared memory
[2025-04-02 14:04:30.826+08:00] [63217] [63219] [server] [ERROR] [connector_launcher.cpp:143] : [ConnectorLauncher::Launch] Failed to setup message channel.
[2025-04-02 14:04:30.833+08:00] [63217] [63219] [server] [ERROR] [model_backend.cpp:230] : [ModelBackend::InstanceInit] Failed to launcher connector agent.
[2025-04-02 14:04:30.833276+08:00] [63217] [63219] [server] [ERROR] [llm_infer_model_instance.cpp:234] : [LLMInferModelInstance::Init] llmManager_ init fail!

解决办法:启动docker时显式配置share_memory

 docker run ...... --shm-size 10g ......

正确启动的相关打印:

[2025-04-02 14:53:47.319+08:00] [119] [121] [server] [INFO] [share_memory.cpp:167] : total shared memory size:10240MB, and available shared memory size:10240MB.
[2025-04-02 14:53:47.319+08:00] [119] [121] [server] [INFO] [share_memory.cpp:168] : [ShareMemory::SharedMemorySizeCheck] shared memory size check success.

特性2:模型目录下的config.json的权限必须为0750

报错信息:

Check path: config.json failed, by: Check Other group permission failed: Current permission is 4, but required no greater than 0. Required permission: 750, but got 644
Failed to check config.json under model weight path.
ERR: Failed to init endpoint! Please check the service log or console output.
Killed

解决办法:将config.json权限设置为0750

chmod 0750 /in/Qwen2.5-3B-Instruct/config.json

特性3:模型目录权限other group权限必须为0

报错信息:

[ERROR] [model_deploy_config.cpp:159] Failed to get vocab size from tokenizer wrapper with exception: PermissionError: The file should not be writable by others who are neither the owner nor in the group. Please check the input path:/invoker-deploy/Qwen_local_57_c7c2d6e7/in/Qwen2.5-3B-Instruct/generation_config.json, and change mode to 33277.
  
  The file should not be writable by others who are neither the owner nor in the group. Please check the input path:/invoker-deploy/Qwen_local_57_c7c2d6e7/in/Qwen2.5-3B-Instruct/model-00002-of-00002.safetensors, and change mode to 33277
  
  
  Check path: config.json failed, by: Check Other group permission failed: Current permission is 4, but required no greater than 0. Required permission: 750, but got 444
Failed to check config.json under model weight path.

解决办法:将other group权限清零,跟上面那样将整个目录权限设置为0750就可以

chmod 0750 /path/to/model/ -R

特性4: model_name不能包含特殊字符,只能包含[a-zA-Z0-9_.-]

报错信息:

The value of modelName must meet the following rules: The string length is [1, 256] and consists of a match of the type [a-zA-Z0-9_.-]. The first and last characters must be characters or digits.
ERR: Failed to init endpoint! Please check the service log or console output.

解决办法:config.json中的model_name不能包含特殊字符

特性5: mindie同一个模型可以配置多实例

可以通过mindie-server的配置文件config.json中的modelInstanceNumber来配置mindie启动多个实例
比如:

在这里插入图片描述
这时worldsize任然是1,但是会在2张卡上分别启动一个实例,对外的服务端口还是一个。2个实例都指定到一张卡上也是可以的,只要现存足够。

        "modelInstanceNumber": 2,
        "npuDeviceIds": [
            [0], [0]  #2个实例在同一张卡上
        ],

特性6:使用openai接口访问,参数的model参数必须和config.json中的model_name一致。以前的版本不检查这个字段

相关文章:

  • 鸿蒙app开发中Emitter 订阅器
  • Reactive编程框架与工具
  • Java 集合介绍
  • Linux--文件系统
  • 第四章 结构化程序设计
  • 【数据挖掘】岭回归(Ridge Regression)和线性回归(Linear Regression)对比实验
  • TBE(TVM的扩展)
  • 滑动窗口滤波
  • OpenIPC开源FPV之Adaptive-Link日志分析
  • 【Linux操作系统】:信号
  • 【Java设计模式】第10章 外观模式讲解
  • C++进阶笔记第一篇:程序的内存模型
  • 简单回溯(组合力扣77)
  • OpenCV 图形API(22)矩阵操作
  • SAP Overview
  • 淘宝 API 高并发优化:突破 QPS 限制的分布式爬虫架构设计
  • java导入excel更新设备经纬度度数或者度分秒
  • UTF-8和GBK编码的区别和详细解释
  • Unity Input 2023 Release-Notes
  • 数据结构第六章(一) -图
  • 深圳市政协原副主席王幼鹏被“双开”
  • 上海市委常委会会议暨市生态文明建设领导小组会议研究基层减负、生态环保等事项
  • 保证断电、碰撞等事故中车门系统能够开启!汽车车门把手将迎来强制性国家标准
  • 菲护卫艇企图侵闯中国黄岩岛领海,南部战区:依法依规跟踪监视、警告驱离
  • 咖啡戏剧节举办第五年,上生新所“无店不咖啡,空间皆可戏”
  • “五一”假期国内出游3.14亿人次,同比增长6.4%