当前位置：首页 > news >正文

Gradient Accumulation (梯度累积) in PyTorch

news 2025/11/17 6:30:06

Gradient Accumulation {梯度累积} in PyTorch

1. Gradient accumulation improves memory efficiency
2. Gradient accumulation with PyTorch
3. Gradient accumulation with Accelerator
4. Gradient accumulation with Trainer
References

Gradient accumulation, Gradient checkpointing and local SGD, Mixed precision training
https://projector-video-pdf-converter.datacamp.com/37998/chapter3.pdf

Improving training efficiency

在这里插入图片描述

1. Gradient accumulation improves memory efficiency

在这里插入图片描述

The problem with large batch sizes

在这里插入图片描述

How does gradient accumulation work?

Gradient accumulation: Sum gradients over smaller batches
Update model parameters after summing gradients

在这里插入图片描述

From PyTorch to Accelerator

在这里插入图片描述

2. Gradient accumulation with PyTorch

在这里插入图片描述

3. Gradient accumulation with Accelerator

在这里插入图片描述

4. Gradient accumulation with Trainer

在这里插入图片描述

References

[1] Yongqiang Cheng (程永强), https://yongqiang.blog.csdn.net/
[2] Gradient accumulation, Gradient checkpointing and local SGD, Mixed precision training, https://projector-video-pdf-converter.datacamp.com/37998/chapter3.pdf

查看全文

http://www.dtcms.com/a/617840.html