当前位置：首页 > news >正文

MiniMax-M2 在SCNet超算平台尝鲜（4卡不够，未完成）

news 2025/10/30 10:47:08

MiniMax-M2重新定义了代理的效率。这是一个紧凑、快速且经济高效的MoE模型（总参数2300亿，活动参数100亿），专为编码和代理任务的精英性能而构建，同时保持强大的通用智能。MiniMax-M2只有100亿个激活参数，提供了当今领先型号所期望的复杂的端到端工具使用性能，但其流线型的外形使部署和扩展比以往任何时候都更容易。

vllm部署手册：MiniMax-M2 Usage Guide - vLLM Recipes

大意了，这个230B，4块64G的AI加速卡跑不动，所以只能意淫一下了。

实践

首先在SCNet找到模型：https://www.scnet.cn/ui/aihub/models/MiniMax/MiniMax-M2

点击“克隆至控制台”

获得在SCNet平台的地址：

地址：/public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2

用vllm调用

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2
/main/MiniMax-M2

当前凑不够8卡的计算资源，就先到这里吧！

调试

报错Cannot find model module. 'MiniMaxM2ForCausalLM' is not a registered model in the Transformers library

File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 60, in resolve_transformers_arch
raise ValueError(
ValueError: Cannot find model module. 'MiniMaxM2ForCausalLM' is not a registered model in the Transformers library (only relevant if the model is meant to be in Transformers) and 'AutoModel' is not present in the model config's 'auto_map' (relevant if the model is custom).

看看vllm版本

pip show vllm

pip show vllm
Name: vllm
Version: 0.8.5.post1+das.opt1.dtk25041
Summary: A high-throughput and memory-efficient inference and serving engine for LLMs
Home-page: https://github.com/vllm-project/vllm
Author: vLLM Team
Author-email: 
License-Expression: Apache-2.0
Location: /opt/conda/lib/python3.10/site-packages
Requires: aiohttp, awscli, blake3, boto3, botocore, cachetools, cloudpickle, cmake, compressed-tensors, datasets, depyf, einops, fastapi, filelock, flash_attn, flash_mla, gguf, huggingface-hub, importlib_metadata, lark, llguidance, lm-format-enforcer, lmslim, mistral_common, msgspec, ninja, numa, numba, numpy, openai, opencv-python-headless, opentelemetry-api, opentelemetry-exporter-otlp, opentelemetry-sdk, opentelemetry-semantic-conventions-ai, outlines, partial-json-parser, peft, pillow, prometheus-fastapi-instrumentator, prometheus_client, protobuf, psutil, py-cpuinfo, pydantic, pytest-asyncio, python-json-logger, python-multipart, pytrie, pyyaml, pyzmq, ray, requests, runai-model-streamer, runai-model-streamer-s3, scipy, sentencepiece, setuptools_scm, tensorizer, tiktoken, tokenizers, torch, tqdm, transformers, triton, typing_extensions, watchfiles, xgrammar
Required-by:

0.8.5的版本，很新啊

换用0.6.2版本试试，也就是欢迎SCNet的dcu 24.04版本的镜像。

0.6.2解决了这个问题，出现了新的报错

报错The checkpoint you are trying to load has model type `minimax` but Transformers does not recognize this architecture

ValueError: The checkpoint you are trying to load has model type `minimax` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
^CException ignored in atexit callback: <function _exit_function at 0x7f2ed42f79a0>

升级transformers

pip install transformers -U

报错ValueError: fp8 quantization is currently not supported in ROCm.

也就是需要去掉fp8，可以在命令行加入--quantization None (后来知道，应该是--speculative-model-quantization None)

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2 --quantization None

不管用，继续报错

学习一下：

基于 ArcticInference 项目的经验，建议采用以下配置组合3：

--quantization None：完全禁用量化
--gpu-memory-utilization 0.7：优化内存使用
--max-num-seqs：根据硬件性能适当降低并发数

这些调整应该能解决您在 ROCm 平台上的 FP8 量化兼容性问题。如果问题仍然存在，建议检查具体的错误日志，进一步定位问题根源

尝试用int8 报错

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2 --quantization int8

usage: vllm serve <model_tag> [options]
vllm serve: error: argument --quantization/-q: invalid choice: 'int8' (choose from 'aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'fbgemm_fp8', 'modelopt', 'marlin', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'experts_int8', 'neuron_quant', None)

看到提醒

INFO: Please install lmslim if you want to infer gptq or awq or w8a8 model.

看到用这句

--speculative-model-quantization

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2 --speculative-model-quantization None

报错变成：

OSError: It looks like the config file at '/public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2/config.json' is not a valid JSON file.

发现config.json文件变成空文件了，这是怎么回事啊

关键这个文件我还没有权限写

ls -la config.json
-rw-r--r--. 1 20458 17294 0 Oct 29 00:14 config.json

查看全文

http://www.dtcms.com/a/545662.html

Java 基本数据类型详解：从理论到实践

自建大模型推理引擎中 KV Cache 的有效设计

0010.static修饰的全局变量被无意间修改

误入网站退不了怎么做制作音乐排行榜网页设计

前端低代码开发实践：配置驱动与可视化搭建

godot4.4 如何让游戏画面没有透视【正交相机】

电子商务平台网站建设方式Wordpress游戏rpg

仓颉语言中Channel通道的深度解析：从原理到高并发实践

数据网站建设多少钱重庆平台网站建设工作

企业网站管理系统使用教程微信小程序开发文档下载

MATLAB复杂曲线曲面造型及导函数实现

OpenAI首发AI浏览器，互联网流量格局如何重塑

【1.3】costas环的MATLAB仿真与测试

使用FormData上传图片和JSON数据注意事项

HBase 核心架构和增删改查

网站建设尚品网站怎么做gps定位

js中如何隐藏eval关键字？

做百度网站一年多少钱儿童教育网站怎么做有趣

一家装修的网站怎么做的购买马来网站域名

游戏平台网站制作网站建设费用是否资本化

Rust Option 与 Result深度解析

湖南官网网站推广软件自己用钢管做里闪弹枪视频和照网站

记一次 pm2 部署 spa 的坑

做网站的时候卖过假货而出过事wordpress固定连接不能访问

Linux安装mysql8.4.6

Explain执行计划

江门市住房城乡建设局网站php网站搬家软件

从官方示例学习使用 CloudSim

会外语和做网站制作微信网站模板免费下载

优秀shell脚本搜集——筑梦之路

实践

调试

报错Cannot find model module. 'MiniMaxM2ForCausalLM' is not a registered model in the Transformers library

报错The checkpoint you are trying to load has model type `minimax` but Transformers does not recognize this architecture

报错ValueError: fp8 quantization is currently not supported in ROCm.

尝试用int8 报错

OSError: It looks like the config file at '/public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2/config.json' is not a valid JSON file.

相关文章：