W3_A1_Ex-6_What's the link between dz[1] and w[2] equation?

As with everything here, it’s just an application of the Chain Rule. But the first thing to be clear about is the meaning of Prof Ng’s notation:

dz^{[1]} = \displaystyle \frac {\partial L}{\partial z^{[1]}}

So by the Chain Rule, we can write:

\displaystyle \frac {\partial L}{\partial z^{[1]}} = \frac {\partial L}{\partial a^{[1]}} \frac {\partial a^{[1]}}{\partial z^{[1]}}

Since of course we have:

a^{[1]} = g^{[1]}(z^{[1]})

That makes the second factor in the formula you highlight obvious:

\displaystyle \frac {\partial a^{[1]}}{\partial z^{[1]}} = g^{[1]'}(z^{[1]})

Then do the Chain Rule one more time on the first factor:

\displaystyle \frac {\partial L}{\partial a^{[1]}} = \frac {\partial L}{\partial z^{[2]}} \frac {\partial z^{[2]}}{\partial a^{[1]}}

z^{[2]} = W^{[2]} \cdot a^{[1]} + b^{[2]}

So that gives us:

\displaystyle \frac {\partial z^{[2]}}{\partial a^{[1]}} = W^{[2]T}

If you reassemble that with all the previous formulas and with a little hand-waving about dot products and transposes, you get the original formula as shown.

In the bigger picture, please note that this course is designed not to require knowledge of even univariate calculus, let alone matrix calculus, so Prof Ng does not owe us derivations of any of these formulas. The good news is you don’t need to know calculus, but that means the bad news is you just have to take his word for it. If you have the calculus background to understand, here’s a thread with links to more detailed derivations and background information about matrix calculus.

1 Like