Gradience Descent Backpropagatin Calculation

Does anyone have a link/note as how the gradient values with respect to dz1 etc. are calculated for the vectorized implementation?

I may be answering a different question than you are asking, but Prof Ng did give the formulas in the lectures for all the elements of the gradient calculations. Here’s the expression he gives for dZ^{[1]} in the Week 3 lectures for a specific 2 layer network:

dZ^{[1]} = ( W^{[2]T} \cdot dZ^{[2]} ) * g^{[1]'}(Z^{[1]})

Where g^{[1]} is the activation function for layer one, meaning that you need the derivative of that function.

If the question is, why is that the formula? That is beyond the scope of this course. Here’s a thread with links to the derivation of back propagation in general and references to the matrix calculus that is needed for the derivation.

1 Like