How we got derivative of dz[1]=w[2]T.dz[2]*g[1]`(z[1])

paulinpaloalto · May 7, 2024, 11:49pm

The formulas start out the same, but you get some simplification in the output layer case, because the derivative of sigmoid and the loss function work very nicely together. Mubsi and Eddy showed that special case for the output layer on this thread.

All this is basically just the Chain Rule, but applied to vectors and matrices. Prof Ng has designed these courses not to require calculus, so we just have to take his word for the formulas. If you have the math background to understand, here’s a thread with links to the derivations of all this.

Topic		Replies	Views
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning coursera-platform	5	608	February 9, 2023
Course 1: Week 3 (backpropagation intuition) Neural Networks and Deep Learning coursera-platform	21	5312	April 27, 2022
W2_A2_Calculation of Partial derivatives Neural Networks and Deep Learning coursera-platform	12	1014	July 24, 2023
How did we calculate dz[2] in Backpropagation Intuition (8:34)? Neural Networks and Deep Learning coursera-platform	1	645	March 6, 2022
W3_A1_Ex-6_What's the link between dz[1] and w[2] equation? Neural Networks and Deep Learning coursera-platform	1	584	October 23, 2022

How we got derivative of dz[1]=w[2]T.dz[2]*g[1]`(z[1])

Related topics