Week 4, "Optional Reading: Feedforward Neural Networks in Depth-Part 1"


Hello, would you mind asking a question how (13) and (14) are derived? It is hard to understand where the index i came from and how a_{k,i}[1] corresponds to z_{j,i}[2]/w_{j,k}[3] in (13). Thank you for your patience.


  1. l ↩︎

  2. l ↩︎

  3. l ↩︎

We can rewrite (10) - (12) in the following form:
{\bf Z}^{[l]} = \begin{bmatrix} z^{[l]}_{1, 1} & \dots & z^{[l]}_{1, i}& \dots & z^{[l]}_{1, m}\\ \vdots & \dots & z^{[l]}_{j, i} & \dots & \vdots \\ & & \vdots & \\ z^{[l]}_{n^{[l]}, 1} & \dots & z^{[l]}_{n^{[l]}, k}& \dots & z^{[l]}_{n^{[l]}, m} \end{bmatrix} = g({\bf W}^{[l]}) =
\phantom{...} g\left( \begin{bmatrix} w^{[l]}_{1, 1} & \dots & w^{[l]}_{1, k}& \dots & w^{[l]}_{1, n^{[l - 1]}}\\ \vdots & \dots & w^{[l]}_{j, k} & \dots & \vdots \\ & & \vdots & \\ w^{[l]}_{n^{[l]}, 1} & \dots & w^{[l]}_{n^{[l]}, k}& \dots & w^{[l]}_{n^{[l]} n^{[l - 1]}} \end{bmatrix} \right),

J = f({\bf Z}^{[l]});

\displaystyle \frac{\partial J}{\partial w^{[l]}_{j, k}} = \sum_{q, i} \frac{\partial J}{\partial z^{[l]}_{q,i}} \frac{\partial z^{[l]}_{q, i}}{\partial w^{[l]}_{j, k}}.
It is easy to verify the last equation by flattening {\bf Z}^{[l]} and {\bf W}^{[l]}. Taking into account (1) for all q \ne j we have \displaystyle \frac{\partial z^{[l]}_{q, i}}{\partial w^{[l]}_{j, k}} = 0, therefore we can set q = j and we obtain the second term of (13).
From (1) it follows that \displaystyle \frac{\partial z^{[l]}_{j, i}}{\partial w^{[l]}_{j, k}} = \sum_p \frac{\partial}{\partial w^{[l]}_{j, k}} w^{[l]}_{j, p} a^{[l - 1]}_{p, i} + \frac{\partial}{\partial w^{[l]}_{j, k}} b^{[l]}_j = a^{[l - 1]}_{k, i} . Perhaps the author @jonaslalin could provide better answer.