Back propagation why do we start from dZ2 and why transpose

Harshawardhan_Deshpa · May 29, 2024, 3:46pm

In programming assignment the solution starts with derivative of dZ2 and the derivative of dZ2 is given as dZ2-Y. Should we not first calculate the derivative of A2?

Further, dW2 = (dZ2.dot(A1.T). Why are we doing a transpose here? I did not find this explanation in any of the recommended videos as well

Thanks in advance.

paulinpaloalto · May 29, 2024, 4:05pm

We did calculate the derivative of A^{[2]} and it is expressed as part of the dZ^{[2]}. This is all just the Chain Rule in action. Remember what Prof Ng’s shorthand notation means:

dZ2 = \displaystyle \frac {\partial L}{\partial Z^{[2]}}
dA2 = \displaystyle \frac {\partial L}{\partial A^{[2]}}

Where L is the vector cost function, not the scalar cost J, which is the average of the L values across the samples.

Of course we have:

A^{[2]} = \sigma(Z^{[2]})

Then by the chain rule:

dZ^{[2]} = \displaystyle \frac {\partial L}{\partial Z^{[2]}} = \frac {\partial L}{\partial A^{[2]}}\frac {\partial A^{[2]}}{\partial Z^{[2]}}

Here’s a thread by Mubsi and Eddy showing how to get from that formula to the simplified result:

dZ^{[2]} = A^{[2]} - Y

As to the question about the formula for dW^{[2]} note that requires matrix calculus and Prof Ng has specifically designed these courses not to require knowledge of calculus. So there’s good news and bad news: the good news is you don’t need to know calculus, but the bad news is you just have to accept the formulas as he gives them to you. If you want to dig deeper, here’s a thread that links to the derivations and other information you need in order to understand this.

Harshawardhan_Deshpa · May 30, 2024, 8:10pm

Thanks for the detailed reply. I will try to digest this.

Topic		Replies	Views
Week 3, "Gradient Descent for Neural Networks" Neural Networks and Deep Learning week-3 , coursera-platform	10	472	March 25, 2024
How did we calculate dz[2] in Backpropagation Intuition (8:34)? Neural Networks and Deep Learning coursera-platform	1	645	March 6, 2022
W3_Vectorization of dZ[2] equations Neural Networks and Deep Learning coursera-platform	5	559	March 31, 2023
W3_A1_Ex-6_What's the link between dz[1] and w[2] equation? Neural Networks and Deep Learning coursera-platform	1	584	October 23, 2022
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning coursera-platform	5	608	February 9, 2023

Back propagation why do we start from dZ2 and why transpose

Related topics