上海人工智能实验室开源基于Intern-S1同等技术的轻量化开源多模态推理模型
简介
我们推出Intern-S1-mini——一款基于Intern-S1同等技术的轻量化开源多模态推理模型。该模型以80亿参数稠密语言模型(Qwen3)和3亿参数视觉编码器(InternViT)为底座,在5万亿token的多模态数据(含超2.5万亿科学领域token)上进行了持续预训练。这使得模型在保持通用能力的同时,能出色处理化学结构解析、蛋白质序列理解、化合物合成路线规划等专业科学任务,成为现实科研应用中得力的智能助手。
特性
-
在语言与视觉推理基准测试(尤其是科学任务)中表现优异。
-
基于5万亿token数据集的持续预训练,其中超50%为专业科学数据,具备深厚的领域知识沉淀。
-
动态分词器可实现分子式与蛋白质序列的原生解析。
性能表现
我们在包括通用数据集和科学数据集在内的多种基准测试上评估了Intern-S1-mini模型。以下是其与近期视觉语言模型及大语言模型的性能对比结果。
Intern-S1-mini | Qwen3-8B | GLM-4.1V | MiMo-VL-7B-RL-2508 | ||
---|---|---|---|---|---|
General | MMLU-Pro | 74.78 | 73.7 | 57.1 | 73.93 |
MMMU | 72.33 | N/A | 69.9 | 70.4 | |
MMStar | 65.2 | N/A | 71.5 | 72.9 | |
GPQA | 65.15 | 62 | 50.32 | 60.35 | |
AIME2024 | 84.58 | 76 | 36.2 | 72.6 | |
AIME2025 | 80 | 67.3 | 32 | 64.4 | |
MathVision | 51.41 | N/A | 53.9 | 54.5 | |
MathVista | 70.3 | N/A | 80.7 | 79.4 | |
IFEval | 81.15 | 85 | 71.53 | 71.4 | |
Scientific | SFE | 35.84 | N/A | 43.2 | 43.9 |
Physics | 28.76 | N/A | 28.3 | 28.2 | |
SmolInstruct | 32.2 | 17.6 | 18.1 | 16.11 | |
ChemBench | 76.47 | 61.1 | 56.2 | 66.78 | |
MatBench | 61.55 | 45.24 | 54.3 | 46.9 | |
MicroVQA | 56.62 | N/A | 50.2 | 50.96 | |
ProteinLMBench | 58.47 | 59.1 | 58.3 | 59.8 | |
MSEarthMCQ | 58.12 | N/A | 50.3 | 47.3 | |
XLRS-Bench | 51.63 | N/A | 49.8 | 12.29 |
我们使用OpenCompass和VLMEvalkit来评估所有模型。
快速开始
采样参数
我们推荐使用以下超参数以获得更好的结果
top_p = 1.0
top_k = 50
min_p = 0.0
temperature = 0.8
Transformers
以下提供演示代码,展示如何基于文本和多模态输入生成内容。
请使用 transformers>=4.55.2 以确保模型正常运行。
文本输入
from transformers import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "internlm/Intern-S1-mini"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)messages = [{"role": "user","content": [{"type": "text", "text": "tell me about an interesting physical phenomenon."},],}
]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)
图片输入
from transformers import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "internlm/Intern-S1-mini"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)messages = [{"role": "user","content": [{"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},{"type": "text", "text": "Please describe the image explicitly."},],}
]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)
视频输入
请确保通过 pip install decord
安装了 decord 视频解码库。为避免内存不足,请安装 flash_attention 并使用至少 2 块 GPU。
from transformers import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "internlm/Intern-S1-mini"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)messages = [{"role": "user","content": [{"type": "video","url": "https://huggingface.co/datasets/hf-internal-testing/fixtures_videos/resolve/main/tennis.mp4",},{"type": "text", "text": "What type of shot is the man performing?"},],}]inputs = processor.apply_chat_template(messages,return_tensors="pt",add_generation_prompt=True,video_load_backend="decord",tokenize=True,return_dict=True,).to(model.device, dtype=torch.float16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)
服务
部署Intern-S1系列模型的最低硬件要求为:
Model | A100(GPUs) | H800(GPUs) | H100(GPUs) | H200(GPUs) |
---|---|---|---|---|
internlm/Intern-S1-mini | 1 | 1 | 1 | 1 |
internlm/Intern-S1-mini-FP8 | - | 1 | 1 | 1 |
您可以使用以下其中一个LLM推理框架来创建一个与OpenAI兼容的服务器:
lmdeploy(>=0.9.2)
lmdeploy serve api_server internlm/Intern-S1-mini --reasoning-parser intern-s1 --tool-call-parser intern-s1
vllm
vllm serve internlm/Intern-S1-mini --trust-remote-code
sglang
python3 -m sglang.launch_server \--model-path internlm/Intern-S1-mini \--trust-remote-code \--grammar-backend none
本地部署的ollama:
# install ollama
curl -fsSL https://ollama.com/install.sh | sh
# fetch model
ollama pull internlm/interns1-mini
# run model
ollama run internlm/interns1-mini
# then use openai client to call on http://localhost:11434/v1
进阶用法
工具调用
目前,许多大语言模型(LLMs)都具备工具调用这一强大功能,使其能够通过调用外部工具或API来扩展能力。借助这一特性,模型可以实现诸如获取实时信息、运行代码或调用其他应用程序中的函数等任务。
对开发者而言,一个显著优势是越来越多的开源LLMs已兼容OpenAI API标准。这意味着您可以使用与OpenAI库相同的语法结构,在这些开源模型上实现工具调用功能。因此,本教程演示的代码具有通用性——不仅适用于OpenAI模型,也兼容任何遵循相同接口标准的模型。
下面我们通过具体代码示例(基于lmdeploy api server)来演示如何利用工具调用获取最新天气预报。
from openai import OpenAI
import jsondef get_current_temperature(location: str, unit: str = "celsius"):"""Get current temperature at a location.Args:location: The location to get the temperature for, in the format "City, State, Country".unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])Returns:the temperature, the location, and the unit in a dict"""return {"temperature": 26.1,"location": location,"unit": unit,}def get_temperature_date(location: str, date: str, unit: str = "celsius"):"""Get temperature at a location and date.Args:location: The location to get the temperature for, in the format "City, State, Country".date: The date to get the temperature for, in the format "Year-Month-Day".unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])Returns:the temperature, the location, the date and the unit in a dict"""return {"temperature": 25.9,"location": location,"date": date,"unit": unit,}def get_function_by_name(name):if name == "get_current_temperature":return get_current_temperatureif name == "get_temperature_date":return get_temperature_datetools = [{'type': 'function','function': {'name': 'get_current_temperature','description': 'Get current temperature at a location.','parameters': {'type': 'object','properties': {'location': {'type': 'string','description': 'The location to get the temperature for, in the format \'City, State, Country\'.'},'unit': {'type': 'string','enum': ['celsius','fahrenheit'],'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'}},'required': ['location']}}
}, {'type': 'function','function': {'name': 'get_temperature_date','description': 'Get temperature at a location and date.','parameters': {'type': 'object','properties': {'location': {'type': 'string','description': 'The location to get the temperature for, in the format \'City, State, Country\'.'},'date': {'type': 'string','description': 'The date to get the temperature for, in the format \'Year-Month-Day\'.'},'unit': {'type': 'string','enum': ['celsius','fahrenheit'],'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'}},'required': ['location','date']}}
}]messages = [{'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in San Francisco now? How about tomorrow?'}
]openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(model=model_name,messages=messages,max_tokens=32768,temperature=0.8,top_p=0.8,stream=False,extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),tools=tools)
print(response.choices[0].message)
messages.append(response.choices[0].message)for tool_call in response.choices[0].message.tool_calls:tool_call_args = json.loads(tool_call.function.arguments)tool_call_result = get_function_by_name(tool_call.function.name)(**tool_call_args)tool_call_result = json.dumps(tool_call_result, ensure_ascii=False)messages.append({'role': 'tool','name': tool_call.function.name,'content': tool_call_result,'tool_call_id': tool_call.id})response = client.chat.completions.create(model=model_name,messages=messages,temperature=0.8,top_p=0.8,stream=False,extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),tools=tools)
print(response.choices[0].message.content)
思维与非思维模式切换
Intern-S1-mini 默认启用思维模式,以增强模型的推理能力,从而生成更高质量的回复。如需禁用该功能,可在 tokenizer.apply_chat_template
中设置 enable_thinking=False
。
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=False # think mode indicator
)
通过LMDeploy服务Intern-S1-mini模型时,你可以在请求中动态调整enable_thinking
参数来控制思考模式。
from openai import OpenAI
import jsonmessages = [
{'role': 'user','content': 'who are you'
}, {'role': 'assistant','content': 'I am an AI'
}, {'role': 'user','content': 'AGI is?'
}]openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(api_key=openai_api_key,base_url=openai_api_base,
)
model_name = client.models.list().data[0].idresponse = client.chat.completions.create(model=model_name,messages=messages,temperature=0.8,top_p=0.8,max_tokens=2048,extra_body={"enable_thinking": False,}
)
print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
对于vllm和sglang用户,请通过以下方式配置:
extra_body={"chat_template_kwargs": {"enable_thinking": False}
}