四层神经网络(含反向传播 Backpropagation)的完整数值计算+流程图示例
非常棒,小飞 👍
你现在已经理解了**前向传播(Forward Propagation)的计算逻辑,下面我们升级一下:
——一起看一个四层神经网络(含反向传播 Backpropagation)**的完整数值计算+流程图示例。
🧠 四层神经网络案例(含反向传播)
📘 网络结构
我们定义如下结构:
| 层次 | 名称 | 神经元个数 |
|---|---|---|
| L₁ | 输入层 | 2 |
| L₂ | 隐藏层1 | 2 |
| L₃ | 隐藏层2 | 2 |
| L₄ | 输出层 | 1 |
激活函数:Sigmoid
损失函数:均方误差 (MSE)
⚙️ 一、前向传播(Forward Propagation)
假设输入:
[
x_1=1, \quad x_2=2
]
目标输出:
[
y_{true}=1
]
1️⃣ 权重与偏置设定
| 层 | 权重矩阵 | 偏置 |
|---|---|---|
| L1→L2 | W₁ = [[0.1, 0.2], [0.3, 0.4]] | b₁=[0.1, 0.2] |
| L2→L3 | W₂ = [[0.5, 0.6], [0.7, 0.8]] | b₂=[0.1, 0.2] |
| L3→L4 | W₃ = [[0.9], [1.0]] | b₃=[0.3] |
2️⃣ 层层计算
隐藏层1(L₂)
[
z^{(2)} = X·W_1 + b_1
]
[
z^{(2)} = [1,2]×[[0.1,0.2],[0.3,0.4]] + [0.1,0.2] = [0.1+0.6+0.1, 0.2+0.8+0.2] = [0.8, 1.2]
]
激活:
[
a^{(2)} = sigmoid(z^{(2)}) = [0.68997, 0.76852]
]
隐藏层2(L₃)
[
z^{(3)} = a^{(2)}·W_2 + b_2
]
[
z^{(3)} = [0.68997,0.76852]×[[0.5,0.6],[0.7,0.8]] + [0.1,0.2]
]
[
z^{(3)} = [0.68997×0.5 + 0.76852×0.7 + 0.1, \quad 0.68997×0.6 + 0.76852×0.8 + 0.2]
]
[
z^{(3)} = [0.8680, 1.0567]
]
激活:
[
a^{(3)} = sigmoid(z^{(3)}) = [0.7043, 0.7421]
]
输出层(L₄)
[
z^{(4)} = a^{(3)}·W_3 + b_3
]
[
z^{(4)} = [0.7043, 0.7421]×[[0.9],[1.0]] + 0.3 = 0.7043×0.9 + 0.7421×1.0 + 0.3 = 1.6740
]
激活:
[
\hat{y} = sigmoid(1.6740) = 0.8421
]
✅ 前向传播结果
| 层 | 输出(激活后) |
|---|---|
| a² | [0.68997, 0.76852] |
| a³ | [0.7043, 0.7421] |
| a⁴ (ŷ) | 0.8421 |
损失:
[
L = \frac{1}{2}(y_{true}-\hat{y})^2 = \frac{1}{2}(1-0.8421)^2 = 0.0124
]
🔁 二、反向传播(Backpropagation)
我们现在从输出层往前计算梯度。
1️⃣ 输出层梯度
输出层激活为 sigmoid:
[
\frac{dL}{d\hat{y}} = (\hat{y} - y_{true}) = 0.8421 - 1 = -0.1579
]
[
\frac{d\hat{y}}{dz^{(4)}} = \hat{y}(1-\hat{y}) = 0.8421×0.1579 = 0.1329
]
[
\frac{dL}{dz^{(4)}} = \frac{dL}{d\hat{y}} × \frac{d\hat{y}}{dz^{(4)}} = -0.1579×0.1329 = -0.0210
]
2️⃣ 输出层权重梯度
[
\frac{dL}{dW_3} = a^{(3)} × \frac{dL}{dz^{(4)}} = [0.7043, 0.7421]^T × (-0.0210)
]
[
dW_3 = [[-0.0148], [-0.0156]]
]
[
db_3 = -0.0210
]
3️⃣ 反传到隐藏层2(L₃)
[
\frac{dL}{da^{(3)}} = W_3 × \frac{dL}{dz^{(4)}} = [[0.9],[1.0]]×(-0.0210) = [-0.0189, -0.0210]
]
[
\frac{da^{(3)}}{dz^{(3)}} = a^{(3)}(1-a^{(3)}) = [0.7043×0.2957, 0.7421×0.2579] = [0.2083, 0.1914]
]
[
\frac{dL}{dz^{(3)}} = \frac{dL}{da^{(3)}}⊙\frac{da^{(3)}}{dz^{(3)}} = [-0.0189×0.2083, -0.0210×0.1914] = [-0.00394, -0.00402]
]
4️⃣ 隐藏层2权重梯度
[
dW_2 = a^{(2)}^T × \frac{dL}{dz^{(3)}}
]
[
dW_2 = [[0.68997, 0.76852]]^T × [-0.00394, -0.00402]
]
[
dW_2 = [[-0.00272, -0.00278],[ -0.00303, -0.00309]]
]
[
db_2 = [-0.00394, -0.00402]
]
5️⃣ 反传到隐藏层1(L₂)
同理:
[
\frac{dL}{da^{(2)}} = \frac{dL}{dz^{(3)}} × W_2^T
]
[
\frac{dL}{da^{(2)}} = [-0.00394, -0.00402] × [[0.5,0.6],[0.7,0.8]]^T = [-0.00563, -0.00624]
]
再乘 sigmoid 导数:
[
\frac{da^{(2)}}{dz^{(2)}} = a^{(2)}(1-a^{(2)}) = [0.68997×0.31003, 0.76852×0.23148] = [0.2148, 0.1778]
]
[
\frac{dL}{dz^{(2)}} = [-0.00563×0.2148, -0.00624×0.1778] = [-0.00121, -0.00111]
]
6️⃣ 最后得到权重梯度
[
dW_1 = X^T × \frac{dL}{dz^{(2)}} = [[1,2]]^T × [-0.00121, -0.00111]
]
[
dW_1 = [[-0.00121, -0.00111], [-0.00242, -0.00222]]
]
[
db_1 = [-0.00121, -0.00111]
]
🧩 三、反向传播计算图(结构图)
(x1,x2)│▼[Layer1] -----------→ dL/dz2 → dW1│▼[Layer2] -----------→ dL/dz3 → dW2│▼[Output] -----------→ dL/dz4 → dW3
或者更完整一点的流程箭头图👇
Forward: X → Z2 → A2 → Z3 → A3 → Z4 → A4(ŷ)
Backward: ← dZ2 ← dA2 ← dZ3 ← dA3 ← dZ4 ← dA4
✅ 四、总结
| 步骤 | 内容 | 说明 |
|---|---|---|
| 前向传播 | X → Ŷ | 计算预测输出 |
| 计算损失 | L(Ŷ, Y) | 度量误差 |
| 反向传播 | dL/dW, dL/db | 计算梯度 |
| 参数更新 | W ← W - η·dW | 用学习率更新参数 |
