MiniMax-M2 在SCNet超算平台尝鲜(4卡不够,未完成)
MiniMax-M2重新定义了代理的效率。这是一个紧凑、快速且经济高效的MoE模型(总参数2300亿,活动参数100亿),专为编码和代理任务的精英性能而构建,同时保持强大的通用智能。MiniMax-M2只有100亿个激活参数,提供了当今领先型号所期望的复杂的端到端工具使用性能,但其流线型的外形使部署和扩展比以往任何时候都更容易。
vllm部署手册:MiniMax-M2 Usage Guide - vLLM Recipes
大意了,这个230B,4块64G的AI加速卡跑不动,所以只能意淫一下了。
实践
首先在SCNet找到模型:https://www.scnet.cn/ui/aihub/models/MiniMax/MiniMax-M2
点击“克隆至控制台”
获得在SCNet平台的地址:
地址:/public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2
用vllm调用
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2
/main/MiniMax-M2
当前凑不够8卡的计算资源,就先到这里吧!
调试
报错Cannot find model module. 'MiniMaxM2ForCausalLM' is not a registered model in the Transformers library
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 60, in resolve_transformers_arch
raise ValueError(
ValueError: Cannot find model module. 'MiniMaxM2ForCausalLM' is not a registered model in the Transformers library (only relevant if the model is meant to be in Transformers) and 'AutoModel' is not present in the model config's 'auto_map' (relevant if the model is custom).
看看vllm版本
pip show vllm
pip show vllm
Name: vllm
Version: 0.8.5.post1+das.opt1.dtk25041
Summary: A high-throughput and memory-efficient inference and serving engine for LLMs
Home-page: https://github.com/vllm-project/vllm
Author: vLLM Team
Author-email:
License-Expression: Apache-2.0
Location: /opt/conda/lib/python3.10/site-packages
Requires: aiohttp, awscli, blake3, boto3, botocore, cachetools, cloudpickle, cmake, compressed-tensors, datasets, depyf, einops, fastapi, filelock, flash_attn, flash_mla, gguf, huggingface-hub, importlib_metadata, lark, llguidance, lm-format-enforcer, lmslim, mistral_common, msgspec, ninja, numa, numba, numpy, openai, opencv-python-headless, opentelemetry-api, opentelemetry-exporter-otlp, opentelemetry-sdk, opentelemetry-semantic-conventions-ai, outlines, partial-json-parser, peft, pillow, prometheus-fastapi-instrumentator, prometheus_client, protobuf, psutil, py-cpuinfo, pydantic, pytest-asyncio, python-json-logger, python-multipart, pytrie, pyyaml, pyzmq, ray, requests, runai-model-streamer, runai-model-streamer-s3, scipy, sentencepiece, setuptools_scm, tensorizer, tiktoken, tokenizers, torch, tqdm, transformers, triton, typing_extensions, watchfiles, xgrammar
Required-by:
0.8.5的版本,很新啊
换用0.6.2版本试试,也就是欢迎SCNet的dcu 24.04版本的镜像。
0.6.2解决了这个问题,出现了新的报错
报错The checkpoint you are trying to load has model type `minimax` but Transformers does not recognize this architecture
ValueError: The checkpoint you are trying to load has model type `minimax` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
^CException ignored in atexit callback: <function _exit_function at 0x7f2ed42f79a0>
升级transformers
pip install transformers -U
报错ValueError: fp8 quantization is currently not supported in ROCm.
也就是需要去掉fp8,可以在命令行加入--quantization None (后来知道,应该是--speculative-model-quantization None)
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2 --quantization None
不管用,继续报错
学习一下:
基于 ArcticInference 项目的经验,建议采用以下配置组合3:
--quantization None:完全禁用量化--gpu-memory-utilization 0.7:优化内存使用--max-num-seqs:根据硬件性能适当降低并发数
这些调整应该能解决您在 ROCm 平台上的 FP8 量化兼容性问题。如果问题仍然存在,建议检查具体的错误日志,进一步定位问题根源
尝试用int8 报错
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2 --quantization int8
usage: vllm serve <model_tag> [options]
vllm serve: error: argument --quantization/-q: invalid choice: 'int8' (choose from 'aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'fbgemm_fp8', 'modelopt', 'marlin', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'experts_int8', 'neuron_quant', None)
看到提醒
INFO: Please install lmslim if you want to infer gptq or awq or w8a8 model.
看到用这句
--speculative-model-quantization
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2 --speculative-model-quantization None
报错变成:
OSError: It looks like the config file at '/public/home/ac7sc1ejvp/SothisAI/model/Aihub/MiniMax-M2/main/MiniMax-M2/config.json' is not a valid JSON file.
发现config.json文件变成空文件了,这是怎么回事啊
关键这个文件我还没有权限写
ls -la config.json
-rw-r--r--. 1 20458 17294 0 Oct 29 00:14 config.json
