当前位置：首页 > news >正文

DeepSpeed 是一个深度学习优化库，使分布式训练和推理变得简单、高效和有效

news 来源：原创 2025/6/13 12:06:49

一、软件介绍

文末提供程序和源码下载

二、用于 DL 训练和推理的极高速度和规模

DeepSpeed enabled the world's most powerful language models (at the time of this writing) such as MT-530B and BLOOM. It is an easy-to-use deep learning optimization software suite that powers unprecedented scale and speed for both training and inference. With DeepSpeed you can:
DeepSpeed 支持世界上最强大的语言模型（在撰写本文时），例如 MT-530B 和 BLOOM。它是一个易于使用的深度学习优化软件套件，为训练和推理提供了前所未有的规模和速度。使用 DeepSpeed，您可以：

Train/Inference dense or sparse models with billions or trillions of parameters
训练/推理具有数十亿或数万亿个参数的密集或稀疏模型
Achieve excellent system throughput and efficiently scale to thousands of GPUs
实现出色的系统吞吐量并高效扩展到数千个 GPU
Train/Inference on resource constrained GPU systems
在资源受限的 GPU 系统上进行训练/推理
Achieve unprecedented low latency and high throughput for inference
实现前所未有的低延迟和高吞吐量，实现推理
Achieve extreme compression for an unparalleled inference latency and model size reduction with low costs
实现极致压缩，以低成本实现无与伦比的推理延迟和模型大小缩减

三、DeepSpeed 的四大创新支柱

DeepSpeed-训练

DeepSpeed 提供了系统创新的融合，使大规模 DL 训练有效且高效，大大提高了易用性，并在可能的规模方面重新定义了 DL 训练的前景。ZeRO、3D-Parallelism、DeepSpeed-MoE、ZeRO-Infinity 等这些创新都属于训练支柱。知道更多： DeepSpeed-Training

DeepSpeed-推理

DeepSpeed 汇集了 tensor、pipeline、expert 和 ZeRO 并行技术的创新，并将它们与高性能自定义推理内核、通信优化和异构内存技术相结合，以实现前所未有的规模推理，同时实现无与伦比的延迟、吞吐量和成本降低。这种用于推理的系统技术的系统组合属于推理支柱。知道更多： DeepSpeed-Inference

DeepSpeed-压缩

为了进一步提高推理效率，DeepSpeed 为研究人员和从业者提供了易于使用且灵活编写的压缩技术，以压缩他们的模型，同时提供更快的速度、更小的模型大小并显著降低压缩成本。此外，ZeroQuant 和 XTC 等 SoTA 压缩创新也包含在压缩支柱下。知道更多： DeepSpeed-Compression

DeepSpeed4科学

根据 Microsoft 解决人类最紧迫挑战的使命，Microsoft 的 DeepSpeed 团队通过发起一项名为 DeepSpeed4Science 的新计划来响应这一机会，旨在通过 AI 系统技术创新构建独特的能力，以帮助领域专家解开当今最大的科学之谜。

四、DeepSpeed 软件套件

DeepSpeed Library DeepSpeed 库

The DeepSpeed library (this repository) implements and packages the innovations and technologies in DeepSpeed Training, Inference and Compression Pillars into a single easy-to-use, open-sourced repository. It allows for easy composition of multitude of features within a single training, inference or compression pipeline. The DeepSpeed Library is heavily adopted by the DL community, and has been used to enable some of the most powerful models (see DeepSpeed Adoption).
DeepSpeed 库（此存储库）实施 DeepSpeed 训练、推理和压缩支柱中的创新和技术，并将其打包到一个易于使用的开源存储库中。它允许在单个训练、推理或压缩管道中轻松组合多种特征。DeepSpeed 库被 DL 社区广泛采用，并已用于支持一些最强大的模型（参见 DeepSpeed 采用）。

Model Implementations for Inference (MII)
推理模型实现（MII）

Model Implementations for Inference (MII) is an open-sourced repository for making low-latency and high-throughput inference accessible to all data scientists by alleviating the need to apply complex system optimization techniques themselves. Out-of-box, MII offers support for thousands of widely used DL models, optimized using DeepSpeed-Inference, that can be deployed with a few lines of code, while achieving significant latency reduction compared to their vanilla open-sourced versions.
Model Implementations for Inference （MII）是一个开源存储库，通过减轻自己应用复杂系统优化技术的需求，使所有数据科学家都可以访问低延迟和高吞吐量的推理。MII 开箱即用，支持数千种广泛使用的 DL 模型，并使用 DeepSpeed-Inference 进行优化，只需几行代码即可部署，同时与普通开源版本相比，延迟显著降低。

DeepSpeed on Azure Azure 上的 DeepSpeed

DeepSpeed users are diverse and have access to different environments. We recommend to try DeepSpeed on Azure as it is the simplest and easiest method. The recommended method to try DeepSpeed on Azure is through AzureML recipes. The job submission and data preparation scripts have been made available here. For more details on how to use DeepSpeed on Azure, please follow the Azure tutorial.
DeepSpeed 用户多种多样，可以访问不同的环境。我们建议尝试 Azure 上的 DeepSpeed，因为它是最简单、最简单的方法。在 Azure 上试用 DeepSpeed 的推荐方法是通过 AzureML 配方。作业提交和数据准备脚本已在此处提供。有关如何在 Azure 上使用 DeepSpeed 的更多详细信息，请按照 Azure 教程进行作。

DeepSpeed Adoption DeepSpeed 采用

DeepSpeed was an important part of Microsoft’s AI at Scale initiative to enable next-generation AI capabilities at scale, where you can find more information here.
DeepSpeed 是 Microsoft 大规模 AI 计划的重要组成部分，旨在大规模实现下一代 AI 功能，您可以在此处找到更多信息。

DeepSpeed has been used to train many different large-scale models, below is a list of several examples that we are aware of (if you'd like to include your model please submit a PR):
DeepSpeed 已被用于训练许多不同的大型模型，以下是我们已知的几个示例列表（如果您想包含您的模型，请提交 PR）：

Megatron-Turing NLG (530B)威震天-图灵 NLG （530B）
Jurassic-1 (178B) 侏罗纪 1 号（178B）
BLOOM (176B) 绽放（176B）
GLM (130B) GLM （130B）
xTrimoPGLM (100B) xTrimoPGLM （100B）
YaLM (100B) YaLM （100B）
GPT-NeoX (20B) GPT-NeoX （20B）
AlexaTM (20B) AlexaTM （20B）
Turing NLG (17B) 图灵 NLG （17B）
METRO-LM (5.4B) METRO-LM （5.4B）

五、Installation 安装

开始使用 DeepSpeed 的最快方法是通过 pip，这将安装最新版本的 DeepSpeed，该版本不与特定的 PyTorch 或 CUDA 版本绑定。DeepSpeed 包括几个 C++/CUDA 扩展，我们通常将其称为“ops”。默认情况下，所有这些扩展/作都将使用 torch 的 JIT C++ 扩展加载器即时（JIT）构建，该加载器依赖于 ninja 在运行时构建和动态链接它们。

Requirements 要求

PyTorch must be installed before installing DeepSpeed.
在安装 DeepSpeed 之前，必须先安装 PyTorch。
For full feature support we recommend a version of PyTorch that is >= 1.9 and ideally the latest PyTorch stable release.
要获得完整的功能支持，我们建议使用 >= 1.9 的 PyTorch 版本，最好是最新的 PyTorch 稳定版本。
A CUDA or ROCm compiler such as nvcc or hipcc used to compile C++/CUDA/HIP extensions.
用于编译 C++/CUDA/HIP 扩展的 CUDA 或 ROCm 编译器，例如 nvcc 或 hipcc。
Specific GPUs we develop and test against are listed below, this doesn't mean your GPU will not work if it doesn't fall into this category it's just DeepSpeed is most well tested on the following:
下面列出了我们开发和测试的特定 GPU，这并不意味着如果您的 GPU 不属于此类别就无法工作，只是 DeepSpeed 在以下方面进行了最充分的测试：
- NVIDIA: Pascal, Volta, Ampere, and Hopper architectures
  NVIDIA：Pascal、Volta、Ampere 和 Hopper 架构
- AMD: MI100 and MI200
  AMD：MI100 和 MI200

Contributed HW support 贡献的硬件支持

DeepSpeed now support various HW accelerators.
DeepSpeed 现在支持各种硬件加速器。

Contributor 贡献	Hardware 硬件	Accelerator Name 加速器名称	Contributor validated 贡献者验证	Upstream validated 上游验证
Huawei 华为	Huawei Ascend NPU 华为 Ascend NPU	npu	Yes 是的	No
Intel 英特尔	Intel(R) Gaudi(R) 2 AI accelerator Intel（R） Gaudi（R） 2 AI 加速器	hpu	Yes 是的	Yes 是的
Intel 英特尔	Intel(R) Xeon(R) Processors 英特尔® 至强® 处理器	cpu 中央处理器	Yes 是的	Yes 是的
Intel 英特尔	Intel(R) Data Center GPU Max series Intel（R） Data Center GPU Max 系列	xpu	Yes 是的	Yes 是的
Tecorigin Tecorigin 公司	Scalable Data Analytics Accelerator 可扩展的数据分析加速器	sdaa	Yes 是的	No

PyPI

We regularly push releases to PyPI and encourage users to install from there in most cases.
我们会定期将版本推送到 PyPI，并在大多数情况下鼓励用户从那里进行安装。

pip install deepspeed

After installation, you can validate your install and see which extensions/ops your machine is compatible with via the DeepSpeed environment report.
安装后，您可以通过 DeepSpeed 环境报告验证您的安装并查看您的计算机与哪些扩展/作兼容。

ds_report

If you would like to pre-install any of the DeepSpeed extensions/ops (instead of JIT compiling) or install pre-compiled ops via PyPI please see our advanced installation instructions.
如果您想预安装任何 DeepSpeed 扩展/作（而不是 JIT 编译）或通过 PyPI 安装预编译的作，请参阅我们的高级安装说明。

Windows 窗户

Many DeepSpeed features are supported on Windows for both training and inference. You can read more about this in the original blog post here. Among features that are currently not supported are async io (AIO) and GDS (which does not support Windows).
Windows 支持许多 DeepSpeed 功能进行训练和推理。您可以在此处的原始博客文章中相关信息。目前不支持的功能包括异步 io （AIO）和 GDS（不支持 Windows）。

Install PyTorch, such as pytorch 2.3+cu121.
安装 PyTorch，例如 pytorch 2.3+cu121。
Install Visual C++ build tools, such as VS2022 C++ x64/x86 build tools.
安装 Visual C++ 生成工具，例如 VS2022 C++ x64/x86 生成工具。
Launch Cmd console with Administrator permissions for creating required symlink folders and ensure MSVC tools are added to your PATH or launch the Developer Command Prompt for Visual Studio 2022 with administrator permissions.
使用管理员权限启动 Cmd 控制台以创建所需的符号链接文件夹，并确保将 MSVC 工具添加到您的 PATH 中，或使用管理员权限启动 Visual Studio 2022 的开发人员命令提示符。
Run build_win.bat to build wheel in dist folder.
运行 build_win.bat 以在 dist 文件夹中构建轮子。

Features 特征

Please checkout DeepSpeed-Training, DeepSpeed-Inference and DeepSpeed-Compression pages for full set of features offered along each of these three pillars.
请查看 DeepSpeed-Training、DeepSpeed-Inference 和 DeepSpeed-Compression 页面，了解这三大支柱中提供的全套功能。