关于反向传播
反向传播
我上一篇写了关于反向传播的内容的是这里:机器学习–神经网络,现在过去想要复习一下已经有点看不懂了(((
所以今天想能不能用更通俗一点的方式描述一下back propagation这个算法的精髓
进入正题
我们首先来看一下一个最最最最最简单的神经网络,她长这样:
于是我们的 cost function 就可以直接写成这种形式:
C ( w 1 , b 1 , w 2 , b 2 , w 3 , b 3 ) C(w_1, b_1, w_2, b_2, w_3, b_3) C(w1,b1,w2,b2,w3,b3)
然后我们令这四个节点分别为 a ( L − 3 ) , a ( L − 2 ) , a ( L − 1 ) , a ( L ) a^{(L-3)}, a^{(L-2)}, a^{(L-1)}, a^{(L)} a(L−3),a(L−2),a(L−1),a(L),我们期望的值是 y y y,那么就显然有:
C ( w 1 , b 1 , w 2 , b 2 , w 3 , b 3 ) = 1 2 ( a ( L ) − y ) 2 C(w_1, b_1, w_2, b_2, w_3, b_3) = \frac 12(a^{(L)} - y)^2 C(w1,b1,w2,b2,w3,b3)=21(a(L)−y)2
然后还记得我们的 a ( L ) a^{(L)} a(L) 是怎么计算的吗? 没错, L L L 层的激发就是它前一层的激发的线性组合加上偏差值 b ( L ) b^{(L)} b(L) 然后再做 r e L U reLU reLU 或者 s i g m o i d sigmoid sigmoid:
z ( L ) = w ( L ) a ( L − 1 ) + b ( L ) a ( L ) = σ ( z ( L ) ) \begin{aligned} & z^{(L)} = w^{(L)}a^{(L - 1)} + b^{(L)} \\ & a^{(L)} = \sigma(z^{(L)}) \end{aligned} z(L)=w(L)a(L−1)+b(L)a(L)=σ(z(L))
第一层
我们现在的目标是要弄明白 w ( L ) w^{(L)} w(L) 的改变会对 C ( . . . ) C(...) C(...) 产生多大的影响,数学上也就是想要知道 ∂ C ∂ w ( L ) \frac{\partial C}{\partial w^{(L)}} ∂w(L)∂C,那么我们就要建立 C C C 和 w ( L ) w^{(L)} w(L) 的关系:
z ( L ) = w ( L ) a ( L − 1 ) + b ( L ) a ( L ) = σ ( z ( L ) ) C = 1 2 ( a ( L ) − y ) 2 \begin{aligned} & z^{(L)} = w^{(L)}a^{(L - 1)} + b^{(L)} \\ & a^{(L)} = \sigma(z^{(L)}) \\ & C = \frac 12(a^{(L)} - y)^2 \end{aligned} z(L)=w(L)a(L−1)+b(L)a(L)=σ(z(L))C=21(a(L)−y)2
我们可以看到,我们 w ( L ) w^{(L)} w(L) 想要和 C ( . . . ) C(...) C(...) 关联起来就是靠这三个方程,那么我们一步一步来:
- 先确定 w ( L ) w^{(L)} w(L) 改变对 z ( L ) z^{(L)} z(L) 的影响程度(也就是 ∂ z ( L ) ∂ w ( L ) \frac{\partial z^{(L)}}{\partial w^{(L)}} ∂w(L)∂z(L))
- 再确定 z ( L ) z^{(L)} z(L) 的改变对 a ( L ) a^{(L)} a(L) 的影响程度(也就是 ∂ a ( L ) ∂ z ( L ) \frac{\partial a^{(L)}}{\partial z^{(L)}} ∂z(L)∂a(L))
- 最后确定 a ( L ) a^{(L)} a(L) 改变对 C C C 的影响程度(也就是 ∂ C ∂ a ( L ) \frac{\partial C}{\partial a^{(L)}} ∂a(L)∂C)
这样分三步来我们就可以得到 w ( L ) w^{(L)} w(L) 改变对 C ( . . . ) C(...) C(...) 的影响程度,数学上来说就是这样(其实就是链式求导法则):
∂ C ∂ w ( L ) = ∂ z ( L ) ∂ w ( L ) ⋅ ∂ a ( L ) ∂ z ( L ) ⋅ ∂ C ∂ a ( L ) \frac{\partial C}{\partial w^{(L)}} = \frac{\partial z^{(L)}}{\partial w^{(L)}} \cdot \frac{\partial a^{(L)}}{\partial z^{(L)}} \cdot \frac{\partial C}{\partial a^{(L)}} ∂w(L)∂C=∂w(L)∂z(L)⋅∂z(L)∂a(L)⋅∂a(L)∂C
然后我们把上式直接展开,就能得到:
∂ C ∂ w ( L ) = ∂ z ( L ) ∂ w ( L ) ⋅ ∂ a ( L ) ∂ z ( L ) ⋅ ∂ C ∂ a ( L ) = a ( L − 1 ) ⋅ σ ′ ( z ( L ) ) ⋅ ( a ( L ) − y ) \begin{aligned} \frac{\partial C}{\partial w^{(L)}} = & \frac{\partial z^{(L)}}{\partial w^{(L)}} \cdot \frac{\partial a^{(L)}}{\partial z^{(L)}} \cdot \frac{\partial C}{\partial a^{(L)}}\\ \\ = & a^{(L - 1)} \cdot \sigma'(z^{(L)}) \cdot (a^{(L)} - y) \end{aligned} ∂w(L)∂C==∂w(L)∂z(L)⋅∂z(L)∂a(L)⋅∂a(L)∂Ca(L−1)⋅σ′(z(L))⋅(a(L)−y)
相似的,我们可以求出 ∂ C ∂ b ( L ) \frac{\partial C}{\partial b^{(L)}} ∂b(L)∂C:
∂ C ∂ b ( L ) = ∂ z ( L ) ∂ b ( L ) ⋅ ∂ a ( L ) ∂ z ( L ) ⋅ ∂ C ∂ a ( L ) = 1 ⋅ σ ′ ( z ( L ) ) ⋅ ( a ( L ) − y ) \begin{aligned} \frac{\partial C}{\partial b^{(L)}} = & \frac{\partial z^{(L)}}{\partial b^{(L)}} \cdot \frac{\partial a^{(L)}}{\partial z^{(L)}} \cdot \frac{\partial C}{\partial a^{(L)}}\\ \\ = & 1 \cdot \sigma'(z^{(L)}) \cdot (a^{(L)} - y) \end{aligned} ∂b(L)∂C==∂b(L)∂z(L)⋅∂z(L)∂a(L)⋅∂a(L)∂C1⋅σ′(z(L))⋅(a(L)−y)
第二层
同样的,我们前一层的东西也是如法炮制,我们有:
z ( L − 1 ) = w ( L − 1 ) a ( L − 2 ) + b ( L − 1 ) a ( L − 1 ) = σ ( z ( L − 1 ) ) z ( L ) = w ( L ) a ( L − 1 ) + b ( L ) a ( L ) = σ ( z ( L ) ) \begin{aligned} & z^{(L - 1)} = w^{(L - 1)}a^{(L - 2)} + b^{(L - 1)} \\ & a^{(L - 1)} = \sigma(z^{(L - 1)}) \\ & z^{(L)} = w^{(L)}a^{(L - 1)} + b^{(L)} \\ & a^{(L)} = \sigma(z^{(L)}) \end{aligned} z(L−1)=w(L−1)a(L−2)+b(L−1)a(L−1)=σ(z(L−1))z(L)=w(L)a(L−1)+b(L)a(L)=σ(z(L))
于是可以得到:
∂ C ∂ w ( L − 1 ) = ∂ z ( L − 1 ) ∂ w ( L − 1 ) ⋅ ∂ a ( L − 1 ) ∂ z ( L − 1 ) ⋅ ∂ C ∂ a ( L − 1 ) = ∂ z ( L − 1 ) ∂ w ( L − 1 ) ⋅ ∂ a ( L − 1 ) ∂ z ( L − 1 ) ⋅ [ ∂ z ( L ) ∂ a ( L − 1 ) ⋅ ∂ a ( L ) ∂ z ( L ) ⋅ ∂ C ∂ a ( L ) ] = a ( L − 2 ) ⋅ σ ′ ( z ( L − 1 ) ) ⋅ [ w ( L ) ⋅ σ ′ ( z ( L ) ) ⋅ ( a ( L ) − y ) ] \begin{aligned} \frac{\partial C}{\partial w^{(L - 1)}} = & \frac{\partial z^{(L - 1)}}{\partial w^{(L - 1)}} \cdot \frac{\partial a^{(L - 1)}}{\partial z^{(L - 1)}} \cdot \frac{\partial C}{\partial a^{(L - 1)}} \\ \\ = & \frac{\partial z^{(L - 1)}}{\partial w^{(L - 1)}} \cdot \frac{\partial a^{(L - 1)}}{\partial z^{(L - 1)}} \cdot \left[\frac{\partial z^{(L)}}{\partial a^{(L - 1)}} \cdot \frac{\partial a^{(L)}}{\partial z^{(L)}} \cdot \frac{\partial C}{\partial a^{(L)}}\right] \\ \\ = & a^{(L - 2)} \cdot \sigma'(z^{(L - 1)}) \cdot \left[w^{(L)} \cdot \sigma'(z^{(L)}) \cdot (a^{(L)} - y)\right] \end{aligned} ∂w(L−1)∂C===∂w(L−1)∂z(L−1)⋅∂z(L−1)∂a(L−1)⋅∂a(L−1)∂C∂w(L−1)∂z(L−1)⋅∂z(L−1)∂a(L−1)⋅[∂a(L−1)∂z(L)⋅∂z(L)∂a(L)⋅∂a(L)∂C]a(L−2)⋅σ′(z(L−1))⋅[w(L)⋅σ′(z(L))⋅(a(L)−y)]
∂ C ∂ b ( L − 1 ) = ∂ z ( L − 1 ) ∂ b ( L − 1 ) ⋅ ∂ a ( L − 1 ) ∂ z ( L − 1 ) ⋅ ∂ C ∂ a ( L − 1 ) = ∂ z ( L − 1 ) ∂ b ( L − 1 ) ⋅ ∂ a ( L − 1 ) ∂ z ( L − 1 ) ⋅ [ ∂ z ( L ) ∂ a ( L − 1 ) ⋅ ∂ a ( L ) ∂ z ( L ) ⋅ ∂ C ∂ a ( L ) ] = 1 ⋅ σ ′ ( z ( L − 1 ) ) ⋅ [ w ( L ) ⋅ σ ′ ( z ( L ) ) ⋅ ( a ( L ) − y ) ] \begin{aligned} \frac{\partial C}{\partial b^{(L - 1)}} = & \frac{\partial z^{(L - 1)}}{\partial b^{(L - 1)}} \cdot \frac{\partial a^{(L - 1)}}{\partial z^{(L - 1)}} \cdot \frac{\partial C}{\partial a^{(L - 1)}} \\ \\ = & \frac{\partial z^{(L - 1)}}{\partial b^{(L - 1)}} \cdot \frac{\partial a^{(L - 1)}}{\partial z^{(L - 1)}} \cdot \left[\frac{\partial z^{(L)}}{\partial a^{(L - 1)}} \cdot \frac{\partial a^{(L)}}{\partial z^{(L)}} \cdot \frac{\partial C}{\partial a^{(L)}}\right] \\ \\ = & 1 \cdot \sigma'(z^{(L - 1)}) \cdot \left[w^{(L)} \cdot \sigma'(z^{(L)}) \cdot (a^{(L)} - y)\right] \end{aligned} ∂b(L−1)∂C===∂b(L−1)∂z(L−1)⋅∂z(L−1)∂a(L−1)⋅∂a(L−1)∂C∂b(L−1)∂z(L−1)⋅∂z(L−1)∂a(L−1)⋅[∂a(L−1)∂z(L)⋅∂z(L)∂a(L)⋅∂a(L)∂C]1⋅σ′(z(L−1))⋅[w(L)⋅σ′(z(L))⋅(a(L)−y)]
对比一下?
我们现在已经算了两层这个东西了,根据已知的这几个表达式我们能看出什么规律吗?我们把求出来的这四个式子再写在下面供对比:
∂ C ∂ w ( L ) = a ( L − 1 ) ⋅ σ ′ ( z ( L ) ) ⋅ ∂ C ∂ a ( L ) , ∂ C ∂ b ( L ) = 1 ⋅ σ ′ ( z ( L ) ) ⋅ ∂ C ∂ a ( L ) ∂ C ∂ w ( L − 1 ) = a ( L − 2 ) ⋅ σ ′ ( z ( L − 1 ) ) ⋅ ∂ C ∂ a ( L − 1 ) , ∂ C ∂ b ( L − 1 ) = 1 ⋅ σ ′ ( z ( L − 1 ) ) ⋅ ∂ C ∂ a ( L ) \begin{aligned} &\frac{\partial C}{\partial w^{(L)}} = a^{(L - 1)} \cdot \sigma'(z^{(L)}) \cdot \frac{\partial C}{\partial a^{(L)}} ,\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L)}} = 1 \cdot \sigma'(z^{(L)}) \cdot \frac{\partial C}{\partial a^{(L)}} \\ &\frac{\partial C}{\partial w^{(L - 1)}} = a^{(L - 2)} \cdot \sigma'(z^{(L - 1)}) \cdot \frac{\partial C}{\partial a^{(L - 1)}}, \;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L - 1)}} = 1 \cdot \sigma'(z^{(L - 1)}) \cdot \frac{\partial C}{\partial a^{(L)}} \end{aligned} ∂w(L)∂C=a(L−1)⋅σ′(z(L))⋅∂a(L)∂C,∂b(L)∂C=1⋅σ′(z(L))⋅∂a(L)∂C∂w(L−1)∂C=a(L−2)⋅σ′(z(L−1))⋅∂a(L−1)∂C,∂b(L−1)∂C=1⋅σ′(z(L−1))⋅∂a(L)∂C
我们可以发现在这几个式子的最后我们都有一个 ∂ C ∂ a \frac{\partial C}{\partial a} ∂a∂C 的项,而且这个项是可以递推计算的,所以我们考虑令 δ ( n ) = ∂ C ∂ a ( n ) \delta^{(n)} = \frac{\partial C}{\partial a^{(n)}} δ(n)=∂a(n)∂C,于是上面四个式子可以写成:
∂ C ∂ w ( L ) = a ( L − 1 ) ⋅ σ ′ ( z ( L ) ) ⋅ δ ( L ) , ∂ C ∂ b ( L ) = 1 ⋅ σ ′ ( z ( L ) ) ⋅ δ ( L ) ∂ C ∂ w ( L − 1 ) = a ( L − 2 ) ⋅ σ ′ ( z ( L − 1 ) ) δ ( L − 1 ) , ∂ C ∂ b ( L − 1 ) = 1 ⋅ σ ′ ( z ( L − 1 ) ) ⋅ δ ( L − 1 ) \begin{aligned} &\frac{\partial C}{\partial w^{(L)}} = a^{(L - 1)} \cdot \sigma'(z^{(L)}) \cdot \delta^{(L)},\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L)}} = 1 \cdot \sigma'(z^{(L)}) \cdot \delta^{(L)} \\ &\frac{\partial C}{\partial w^{(L - 1)}} = a^{(L - 2)} \cdot \sigma'(z^{(L - 1)}) \delta^{(L - 1)}, \;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L - 1)}} = 1 \cdot \sigma'(z^{(L - 1)}) \cdot \delta^{(L - 1)} \end{aligned} ∂w(L)∂C=a(L−1)⋅σ′(z(L))⋅δ(L),∂b(L)∂C=1⋅σ′(z(L))⋅δ(L)∂w(L−1)∂C=a(L−2)⋅σ′(z(L−1))δ(L−1),∂b(L−1)∂C=1⋅σ′(z(L−1))⋅δ(L−1)
其中 δ ( L ) \delta^{(L)} δ(L) 的递推式可以写成:
δ ( L − 1 ) = w ( L ) ⋅ σ ′ ( z ( L ) ) ⋅ δ ( L ) \delta^{(L - 1)} = w^{(L)}\cdot \sigma'(z^{(L)}) \cdot \delta^{(L)} δ(L−1)=w(L)⋅σ′(z(L))⋅δ(L)
这个递推关系也很好证明:
δ ( L − 1 ) = ∂ C ∂ a ( L − 1 ) = ∂ C ∂ a ( L ) ⋅ ∂ a ( L ) ∂ z ( L ) ⋅ ∂ z ( L ) ∂ a ( L − 1 ) = δ ( L ) ⋅ σ ′ ( z ( L ) ) ⋅ w ( L ) \begin{aligned} \delta^{(L - 1)} = & \frac{\partial C}{\partial a^{(L - 1)}} \\ \\ = & \frac{\partial C}{\partial a^{(L)}} \cdot \frac{\partial a^{(L)}}{\partial z^{(L)}} \cdot \frac{\partial z^{(L)}}{\partial a^{(L - 1)}} \\ \\ = & \delta^{(L)} \cdot \sigma'(z^{(L)}) \cdot w^{(L)} \end{aligned} δ(L−1)===∂a(L−1)∂C∂a(L)∂C⋅∂z(L)∂a(L)⋅∂a(L−1)∂z(L)δ(L)⋅σ′(z(L))⋅w(L)
第三层
我们最后来验证一下前二层的式子是否满足我们刚才得到的递推关系,同样的,我们有:
z ( L − 2 ) = w ( L − 2 ) a ( L − 3 ) + b ( L − 2 ) a ( L − 2 ) = σ ( z ( L − 2 ) ) z ( L − 1 ) = w ( L − 1 ) a ( L − 2 ) + b ( L − 1 ) a ( L − 1 ) = σ ( z ( L − 1 ) ) \begin{aligned} & z^{(L - 2)} = w^{(L - 2)}a^{(L - 3)} + b^{(L - 2)} \\ & a^{(L - 2)} = \sigma(z^{(L - 2)}) \\ & z^{(L - 1)} = w^{(L - 1)}a^{(L - 2)} + b^{(L - 1)} \\ & a^{(L - 1)} = \sigma(z^{(L - 1)}) \\ \end{aligned} z(L−2)=w(L−2)a(L−3)+b(L−2)a(L−2)=σ(z(L−2))z(L−1)=w(L−1)a(L−2)+b(L−1)a(L−1)=σ(z(L−1))
于是:
∂ C ∂ w ( L − 2 ) = ∂ z ( L − 2 ) ∂ w ( L − 2 ) ⋅ ∂ a ( L − 2 ) ∂ z ( L − 2 ) ⋅ ∂ C ∂ a ( L − 2 ) = ∂ z ( L − 2 ) ∂ w ( L − 2 ) ⋅ ∂ a ( L − 2 ) ∂ z ( L − 2 ) ⋅ δ ( L − 2 ) = a ( L − 2 ) ⋅ σ ′ ( z ( L − 1 ) ) ⋅ δ ( L − 2 ) \begin{aligned} \frac{\partial C}{\partial w^{(L - 2)}} = & \frac{\partial z^{(L - 2)}}{\partial w^{(L - 2)}} \cdot \frac{\partial a^{(L - 2)}}{\partial z^{(L - 2)}} \cdot \frac{\partial C}{\partial a^{(L - 2)}} \\ \\ = & \frac{\partial z^{(L - 2)}}{\partial w^{(L - 2)}} \cdot \frac{\partial a^{(L - 2)}}{\partial z^{(L - 2)}} \cdot \delta^{(L - 2)} \\ \\ = & a^{(L - 2)} \cdot \sigma'(z^{(L - 1)}) \cdot \delta^{(L - 2)} \end{aligned} ∂w(L−2)∂C===∂w(L−2)∂z(L−2)⋅∂z(L−2)∂a(L−2)⋅∂a(L−2)∂C∂w(L−2)∂z(L−2)⋅∂z(L−2)∂a(L−2)⋅δ(L−2)a(L−2)⋅σ′(z(L−1))⋅δ(L−2)
∂ C ∂ b ( L − 2 ) = ∂ z ( L − 2 ) ∂ b ( L − 2 ) ⋅ ∂ a ( L − 2 ) ∂ z ( L − 2 ) ⋅ ∂ C ∂ a ( L − 2 ) = ∂ z ( L − 2 ) ∂ b ( L − 2 ) ⋅ ∂ a ( L − 2 ) ∂ z ( L − 2 ) ⋅ δ ( L − 2 ) = a ( L − 2 ) ⋅ σ ′ ( z ( L − 1 ) ) ⋅ δ ( L − 2 ) \begin{aligned} \frac{\partial C}{\partial b^{(L - 2)}} = & \frac{\partial z^{(L - 2)}}{\partial b^{(L - 2)}} \cdot \frac{\partial a^{(L - 2)}}{\partial z^{(L - 2)}} \cdot \frac{\partial C}{\partial a^{(L - 2)}} \\ \\ = & \frac{\partial z^{(L - 2)}}{\partial b^{(L - 2)}} \cdot \frac{\partial a^{(L - 2)}}{\partial z^{(L - 2)}} \cdot \delta^{(L - 2)} \\ \\ = & a^{(L - 2)} \cdot \sigma'(z^{(L - 1)}) \cdot \delta^{(L - 2)} \end{aligned} ∂b(L−2)∂C===∂b(L−2)∂z(L−2)⋅∂z(L−2)∂a(L−2)⋅∂a(L−2)∂C∂b(L−2)∂z(L−2)⋅∂z(L−2)∂a(L−2)⋅δ(L−2)a(L−2)⋅σ′(z(L−1))⋅δ(L−2)
其中:
δ ( L − 2 ) = ∂ C ∂ a ( L − 2 ) = ∂ C ∂ a ( L − 1 ) ⋅ ∂ a ( L − 1 ) ∂ z ( L − 1 ) ⋅ ∂ z ( L − 1 ) ∂ a ( L − 2 ) = δ ( L − 1 ) ⋅ σ ′ ( z ( L − 1 ) ) ⋅ w ( L − 1 ) \begin{aligned} \delta^{(L - 2)} = & \frac{\partial C}{\partial a^{(L - 2)}} \\ \\ = & \frac{\partial C}{\partial a^{(L - 1)}} \cdot \frac{\partial a^{(L - 1)}}{\partial z^{(L - 1)}} \cdot \frac{\partial z^{(L - 1)}}{\partial a^{(L - 2)}} \\ \\ = & \delta^{(L - 1)} \cdot \sigma'(z^{(L - 1)}) \cdot w^{(L - 1)} \end{aligned} δ(L−2)===∂a(L−2)∂C∂a(L−1)∂C⋅∂z(L−1)∂a(L−1)⋅∂a(L−2)∂z(L−1)δ(L−1)⋅σ′(z(L−1))⋅w(L−1)
我们现在把已经得到的三组递推关系整理一下:
∂ C ∂ w ( L ) = a ( L − 1 ) ⋅ σ ′ ( z ( L ) ) ⋅ δ ( L ) , ∂ C ∂ b ( L ) = 1 ⋅ σ ′ ( z ( L ) ) ⋅ δ ( L ) ∂ C ∂ w ( L − 1 ) = a ( L − 2 ) ⋅ σ ′ ( z ( L − 1 ) ) δ ( L − 1 ) , ∂ C ∂ b ( L − 1 ) = 1 ⋅ σ ′ ( z ( L − 1 ) ) ⋅ δ ( L − 1 ) ∂ C ∂ w ( L − 2 ) = a ( L − 3 ) ⋅ σ ′ ( z ( L − 2 ) ) δ ( L − 2 ) , ∂ C ∂ b ( L − 2 ) = 1 ⋅ σ ′ ( z ( L − 2 ) ) ⋅ δ ( L − 2 ) \begin{aligned} &\frac{\partial C}{\partial w^{(L)}} = a^{(L - 1)} \cdot \sigma'(z^{(L)}) \cdot \delta^{(L)},\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L)}} = 1 \cdot \sigma'(z^{(L)}) \cdot \delta^{(L)} \\ &\frac{\partial C}{\partial w^{(L - 1)}} = a^{(L - 2)} \cdot \sigma'(z^{(L - 1)}) \delta^{(L - 1)}, \;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L - 1)}} = 1 \cdot \sigma'(z^{(L - 1)}) \cdot \delta^{(L - 1)} \\ &\frac{\partial C}{\partial w^{(L - 2)}} = a^{(L - 3)} \cdot \sigma'(z^{(L - 2)}) \delta^{(L - 2)}, \;\;\;\;\;\;\;\;\;\;\;\; \frac{\partial C}{\partial b^{(L - 2)}} = 1 \cdot \sigma'(z^{(L - 2)}) \cdot \delta^{(L - 2)} \end{aligned} ∂w(L)∂C=a(L−1)⋅σ′(z(L))⋅δ(L),∂b(L)∂C=1⋅σ′(z(L))⋅δ(L)∂w(L−1)∂C=a(L−2)⋅σ′(z(L−1))δ(L−1),∂b(L−1)∂C=1⋅σ′(z(L−1))⋅δ(L−1)∂w(L−2)∂C=a(L−3)⋅σ′(z(L−2))δ(L−2),∂b(L−2)∂C=1⋅σ′(z(L−2))⋅δ(L−2)
δ ( L ) = ( a ( L ) − y ) δ ( L − 1 ) = w ( L ) ⋅ σ ′ ( z ( L ) ) ⋅ δ ( L ) δ ( L − 2 ) = w ( L − 1 ) ⋅ σ ′ ( z ( L − 1 ) ) ⋅ δ ( L − 1 ) \begin{aligned} & \delta^{(L)} = (a^{(L)} - y) \\ & \delta^{(L - 1)} = w^{(L)}\cdot \sigma'(z^{(L)}) \cdot \delta^{(L)} \\ & \delta^{(L - 2)} = w^{(L - 1)}\cdot \sigma'(z^{(L - 1)}) \cdot \delta^{(L - 1)} \end{aligned} δ(L)=(a(L)−y)δ(L−1)=w(L)⋅σ′(z(L))⋅δ(L)δ(L−2)=w(L−1)⋅σ′(z(L−1))⋅δ(L−1)
于是我们找到了一种递推关系使得我们可以递推来求得 C C C 关于所有 w w w 和 b b b 的偏导数,这样一来我们就能利用这些偏导数来做 S G D SGD SGD 了。