机器学习:反向神经元传播公式推导
首先,我们有正向传播的公式:
qk+1,i=∑j=1nkwk+1,i,j⋅rk,j+bk+1,i
q_{k+1,i}=\sum_{j=1}^{n_{k}} w_{k+1,i,j}\cdot r_{k,j}+b_{k+1,i}
qk+1,i=j=1∑nkwk+1,i,j⋅rk,j+bk+1,i
∂l∂wk,i,j=∂l∂qk,i⋅∂qk,i∂wk,i,j=∂l∂qk,i⋅rk−1,j \begin{aligned} \frac{\partial l}{\partial w_{k,i,j}}&= \frac{\partial l}{\partial q_{k,i}}\cdot\frac{\partial q_{k,i}}{\partial w_{k,i,j}}\\ &=\frac{\partial l}{\partial q_{k,i}}\cdot r_{k-1,j} \end{aligned} ∂wk,i,j∂l=∂qk,i∂l⋅∂wk,i,j∂qk,i=∂qk,i∂l⋅rk−1,j
∂l∂bk,i=∂l∂qk,i⋅∂qk,i∂bk,i=∂l∂qk,i \begin{aligned} \frac{\partial l}{\partial b_{k,i}} &=\frac{\partial l}{\partial q_{k,i}}\cdot\frac{\partial q_{k,i}}{\partial b_{k,i}}\\ &=\frac{\partial l}{\partial q_{k,i}} \end{aligned} ∂bk,i∂l=∂qk,i∂l⋅∂bk,i∂qk,i=∂qk,i∂l
观察这个式子:
qk+1,i=∑j=1nkwk+1,i,j⋅rk,j+bk+1,i q_{k+1,i}=\sum_{j=1}^{n_{k}} w_{k+1,i,j}\cdot r_{k,j}+b_{k+1,i} qk+1,i=j=1∑nkwk+1,i,j⋅rk,j+bk+1,i
我们考察 rk,jr_{k,j}rk,j 对 qk+1,iq_{k+1,i}qk+1,i 的影响,发现:
∂qk+1,i∂rk,j=wk+1,i,j
\frac{\partial q_{k+1,i}}{\partial r_{k,j}} =w_{k+1,i,j}
∂rk,j∂qk+1,i=wk+1,i,j
进而:
∂qk+1,i∂qk,j=∂qk+1,i∂rk,j⋅∂rk,j∂qk,j=wk+1,i,j⋅fk′(qk,j) \begin{aligned} \frac{\partial q_{k+1,i}}{\partial q_{k,j}} &= \frac{\partial q_{k+1,i}}{\partial r_{k,j}} \cdot \frac{\partial r_{k,j}}{\partial q_{k,j}} \\ &= w_{k+1,i,j} \cdot f_k^{'}(q_{k,j}) \end{aligned} ∂qk,j∂qk+1,i=∂rk,j∂qk+1,i⋅∂qk,j∂rk,j=wk+1,i,j⋅fk′(qk,j)
因此:
δk,j=∂l∂qk,j=∂l∂qk+1,i⋅∂qk+1,i∂qk,j=δk+1,i⋅∂qk+1,i∂qk,j \begin{aligned} \delta_{k,j} = \frac{\partial l}{\partial q_{k,j}} &= \frac{\partial l}{\partial q_{k+1,i}} \cdot \frac{\partial q_{k+1,i}}{\partial q_{k,j}} \\ &= \delta_{k+1,i} \cdot \frac{\partial q_{k+1,i}}{\partial q_{k,j}} \\ \end{aligned} δk,j=∂qk,j∂l=∂qk+1,i∂l⋅∂qk,j∂qk+1,i=δk+1,i⋅∂qk,j∂qk+1,i
最后,由于每一个神经元对下一层有多条影响路径,所以对其求和,并带入
∂qk+1,i∂qk,j\frac{\partial q_{k+1,i}}{\partial q_{k,j}}∂qk,j∂qk+1,i
:
δk,j=∂l∂qk,j=∑i=1nk+1∂l∂qk+1,i⋅∂qk+1,i∂qk,j=fk′(qk,j)⋅∑i=1nk+1δk+1,i⋅wk+1,i,j \begin{aligned} \delta_{k,j}= \frac{\partial l}{\partial q_{k,j}} &= \sum_{i=1}^{n_{k+1}} \frac{\partial l}{\partial q_{k+1,i}} \cdot \frac{\partial q_{k+1,i}}{\partial q_{k,j}} \\ &= f_k^{'}(q_{k,j}) \cdot \sum_{i=1}^{n_{k+1}} \delta_{k+1,i} \cdot w_{k+1,i,j} \end{aligned} δk,j=∂qk,j∂l=i=1∑nk+1∂qk+1,i∂l⋅∂qk,j∂qk+1,i=fk′(qk,j)⋅i=1∑nk+1δk+1,i⋅wk+1,i,j
l=L(rT1,rT2,...rTnT,y1,y2,...ynT)l=L(r_{T1}, r_{T2}, ... r_{Tn_T}, y_1, y_2, ... y_{n_T})l=L(rT1,rT2,...rTnT,y1,y2,...ynT)
∂l∂qTi=∂l∂rTi⋅∂rTi∂qTi=∂l∂rTi⋅fT′(qTi) \begin{aligned} \dfrac{\partial l}{\partial q_{Ti}}&=\dfrac{\partial l}{\partial r_{Ti}}\cdot\dfrac{\partial r_{Ti}}{\partial q_{Ti}}\\ &=\dfrac{\partial l}{\partial r_{Ti}}\cdot f_T^{'}(q_{Ti}) \end{aligned} ∂qTi∂l=∂rTi∂l⋅∂qTi∂rTi=∂rTi∂l⋅fT′(qTi)