How we got derivative of dz[1]=w[2]T.dz[2]*g[1]`(z[1])

The formulas start out the same, but you get some simplification in the output layer case, because the derivative of sigmoid and the loss function work very nicely together. Mubsi and Eddy showed that special case for the output layer on this thread.

All this is basically just the Chain Rule, but applied to vectors and matrices. Prof Ng has designed these courses not to require calculus, so we just have to take his word for the formulas. If you have the math background to understand, here’s a thread with links to the derivations of all this.

1 Like