【vLLM】源码解读:模型如何找到自己初始化的类
模型类发现和实例化的完整流程
🔍 流程图
用户指定模型路径↓
读取 config.json↓
获取 "architectures" 字段↓
ModelRegistry.resolve_model_cls()↓
查找映射表 (_VLLM_MODELS)↓
返回 (模型类, 架构名)↓
initialize_model()↓
实例化模型类
1️⃣ 从 HuggingFace config.json 获取架构名
当加载模型时,vLLM 首先读取模型的 config.json
:
// 例如: meta-llama/Llama-2-7b-hf/config.json
{"architectures": ["LlamaForCausalLM" // ← 这就是架构名],"model_type": "llama",...
}
2️⃣ 映射表定义 (registry.py:45-329行)
vLLM 维护了一个巨大的映射表,将 HuggingFace 架构名映射到 vLLM 实现:
# registry.py:45-156
_TEXT_GENERATION_MODELS = {# HuggingFace 架构名 : (vLLM 模块名, vLLM 类名)"LlamaForCausalLM": ("llama", "LlamaForCausalLM"),"Qwen2ForCausalLM": ("qwen2", "Qwen2ForCausalLM"),"MistralForCausalLM": ("llama", "LlamaForCausalLM"), # 复用 Llama"GPT2LMHeadModel": ("gpt2", "GPT2LMHeadModel"),"DeepseekV3ForCausalLM": ("deepseek_v2", "DeepseekV3ForCausalLM"),...
}_MULTIMODAL_MODELS = {"Qwen2VLForConditionalGeneration": ("qwen2_vl", "Qwen2VLForConditionalGeneration"),"LlavaNextForConditionalGeneration": ("llava_next", "LlavaNextForConditionalGeneration"),...
}# 321-329行:合并所有映射
_VLLM_MODELS = {**_TEXT_GENERATION_MODELS,**_EMBEDDING_MODELS,**_CROSS_ENCODER_MODELS,**_MULTIMODAL_MODELS,**_SPECULATIVE_DECODING_MODELS,**_TRANSFORMERS_SUPPORTED_MODELS,**_TRANSFORMERS_BACKEND_MODELS,
}
3️⃣ 初始化全局 Registry (registry.py:943-950行)
# registry.py:943-950
ModelRegistry = _ModelRegistry({model_arch:_LazyRegisteredModel(module_name=f"vllm.model_executor.models.{mod_relname}",class_name=cls_name,)for model_arch, (mod_relname, cls_name) in _VLLM_MODELS.items()
})
举例:
- 架构名:
"LlamaForCausalLM"
- 映射:
("llama", "LlamaForCausalLM")
- 结果:
_LazyRegisteredModel(module_name="vllm.model_executor.models.llama", class_name="LlamaForCausalLM")
4️⃣ 解析模型类 (registry.py:785-836行)
Ran tool
# registry.py:785-836
def resolve_model_cls(self,architectures: Union[str, list[str]],model_config: ModelConfig,
) -> tuple[type[nn.Module], str]:# ① 遍历所有架构名for arch in architectures:# ② 标准化架构名(处理 runner_type 等)normalized_arch = self._normalize_arch(arch, model_config)# ③ 尝试加载模型类model_cls = self._try_load_model_cls(normalized_arch)if model_cls is not None:return (model_cls, arch)# ④ 如果找不到,抛出异常return self._raise_for_unsupported(architectures)
5️⃣ 懒加载模型类 (registry.py:527-529行)
def load_model_cls(self) -> type[nn.Module]:# 动态导入模块mod = importlib.import_module(self.module_name)# 获取类return getattr(mod, self.class_name)
例子:
module_name = "vllm.model_executor.models.llama"
class_name = "LlamaForCausalLM"
- 结果:
from vllm.model_executor.models.llama import LlamaForCausalLM
6️⃣ 实例化模型 (utils.py:40-93行)
def initialize_model(vllm_config: VllmConfig,*,prefix: str = "",model_class: Optional[type[nn.Module]] = None,model_config: Optional[ModelConfig] = None,
) -> nn.Module:if model_class is None:# 获取模型类model_class, _ = get_model_architecture(model_config)# 实例化return model_class(vllm_config=vllm_config, prefix=prefix)
完整示例
场景:加载 Llama-2-7b 模型
# ① 读取 config.json
config = {"architectures": ["LlamaForCausalLM"],"model_type": "llama",...
}# ② 查询映射表
architecture = "LlamaForCausalLM"
mapping = _VLLM_MODELS[architecture]
# → ("llama", "LlamaForCausalLM")# ③ 懒加载模型类
module = importlib.import_module("vllm.model_executor.models.llama")
model_cls = getattr(module, "LlamaForCausalLM")
# → <class 'vllm.model_executor.models.llama.LlamaForCausalLM'># ④ 实例化模型
model = model_cls(vllm_config=vllm_config, prefix="")
# → LlamaForCausalLM instance
特殊情况处理
1. 架构重用
# Mistral 复用 Llama 实现
"MistralForCausalLM": ("llama", "LlamaForCausalLM"),# DeepseekV3 也复用 DeepseekV2 实现
"DeepseekV32ForCausalLM": ("deepseek_v2", "DeepseekV3ForCausalLM"),
2. 自定义模型注册
# 用户可以注册自己的模型
ModelRegistry.register_model(model_arch="MyCustomModel",model_cls="my_module:MyModelClass"
)
3. Transformers 后备
如果 vLLM 没有实现,会尝试使用 Transformers:
# registry.py:826-834
# Fallback to transformers impl
if all(arch not in self.models for arch in architectures):arch = self._try_resolve_transformers(architectures[0], model_config)if arch is not None:model_cls = self._try_load_model_cls(arch)
架构对比表
步骤 | HuggingFace | vLLM |
---|---|---|
配置 | config.json | 内置映射表 |
架构名 | "LlamaForCausalLM" | "LlamaForCausalLM" |
模块路径 | transformers.models.llama | vllm.model_executor.models.llama |
类名 | LlamaForCausalLM | LlamaForCausalLM |
加载方式 | AutoModelForCausalLM.from_pretrained() | ModelRegistry.resolve_model_cls() |
关键设计优势
✅ 懒加载:只在需要时才导入模块,避免初始化 CUDA
✅ 灵活映射:一个架构可以映射到不同实现
✅ 可扩展:用户可以注册自定义模型
✅ 后备机制:未实现的模型可以退回到 Transformers
✅ 缓存:使用 @lru_cache
缓存已加载的类
总结
模型类的发现流程:
- 读取
config.json
的architectures
字段 - 查找
_VLLM_MODELS
映射表 - 获取 模块名和类名
- 懒加载 动态导入模块
- 返回 模型类
- 实例化 创建模型对象
这种设计使得 vLLM 可以灵活地支持数百种模型架构,同时保持代码的可维护性和扩展性!🎯