当前位置：首页 > news >正文

Per-Tensor 量化和Per-Channel 量化

news 2025/9/23 12:33:36

1️⃣ 基本概念

量化的目标

量化（Quantization）是将浮点数权重或激活映射到整数表示，从而降低模型存储和计算成本。

量化公式通常为：

$\text{round}\left(\frac{r}{s}\right)$

其中：

$r$ 是浮点数权重或激活
$q$ 是整数表示
$s$ 是 scale（缩放因子）
对于 非对称量化，还有 zero-point $z$ ：

$\text{round}\left(\frac{r}{s}\right) + z$

Per-Tensor 量化

定义：整个张量使用一个统一的 scale（和 zero-point）进行量化。
公式（对称量化为例）：

$\frac{\max(|W|)}{2^{b-1}-1}$

其中：

$W$ 是权重张量
$b$ 是量化位宽（比如 8-bit）

特点：

只用一个 scale，简单高效
对 所有通道 使用相同 scale
对权重差异较大的通道，会出现精度损失

Per-Channel 量化

定义：对张量的每个输出通道（通常是卷积的 out_channels 或 Linear 的 out_features）使用独立的 scale（和 zero-point）进行量化。
公式（对称量化）：对每个通道 $c$ ：

$s_c = \frac{\max(|W_c|)}{2^{b-1}-1}$

特点：

每个通道的 scale 不同，更精细，能够保留不同通道权重的动态范围
更高精度，尤其是卷积网络里通道差异大时
实现稍复杂，需要在推理时对每个通道单独反量化

2️⃣ 举例对比

假设有一个 4×3 的卷积权重：

$\begin{bmatrix} 0.1 & 0.2 & -0.1 \\ 3.0 & -2.5 & 1.2 \\ 0.05 & 0.1 & -0.05 \\ -1.0 & 0.5 & 0.2 \end{bmatrix}$

Per-Tensor 量化：
- 找整个张量最大绝对值：3.0
- scale = 3.0 / 127 ≈ 0.0236
- 所有元素都用这个 scale 量化 → 小数值会损失较多精度
Per-Channel 量化（按行/通道量化）：
- 通道 1 max=3.0 → s1=0.0236
- 通道 2 max=2.5 → s2≈0.0197
- 通道 3 max=1.2 → s3≈0.00945
- 每个通道单独量化 → 精度更高

3️⃣ 优缺点对比

特性	Per-Tensor	Per-Channel
精度	较低（通道动态范围不同损失大）	较高
实现复杂度	简单	复杂（每通道独立 scale）
存储开销	少（一个 scale）	多（每个通道一个 scale）
常用场景	激活量化（A8）、小模型	权重量化（W8）、大模型

4️⃣ PyTorch 示例

import torch
import torch.nn as nn
import torch.quantization as tq# 假设线性层权重
weight = torch.tensor([[0.1, 0.2, -0.1],[3.0, -2.5, 1.2],[0.05, 0.1, -0.05],[-1.0, 0.5, 0.2]])# Per-Tensor 量化
scale = torch.max(weight.abs()) / 127
q_weight_tensor = torch.round(weight / scale)
print("Per-Tensor Quantized:\n", q_weight_tensor)# Per-Channel 量化 (按行)
scales = torch.max(weight.abs(), dim=1)[0] / 127
q_weight_channel = torch.round(weight / scales[:, None])
print("Per-Channel Quantized:\n", q_weight_channel)