Confusion about Calculating dZ^[l]

c_q · October 26, 2022, 3:49am

Hello I’m in the fourth week of the first course and I think I’m getting the hang of it but I’m a little thrown by different equations for calculating that initial value of dZ^L when beginning backward propagation.

In my notes I have dZ^L calculated as:

dZ^L = A^L - Y

or alternatively

dZ^L = dA^L * g^L’ (Z^L)

are these both correct ? If so I’m having trouble understanding how they relate. Doesn’t this imply that:

A^L - Y = dA^L * g^L’ (Z^L) ?

I find it confusing that the difference between our predictions and the true labels would be equal to the derivative of the predictions multiplied elementwise with the result of applying the derivative of our activation function applied to Z^L.

That all strikes me as very unintuitive. Can someone point me to a justification of this, or have I misunderstood something.

thanks for any help you can offer.

Phuc_Kien_Bui · October 26, 2022, 4:17am

Hello,
L just for the last layer. Other layers you use dZ^l = dA^Ll* g^l’ (Z^l).
Screen Shot 2022-10-26 at 11.15.22

paulinpaloalto · October 26, 2022, 4:19am

Well, it may not be intuitive, but you just have to work out the math, remembering that we’re dealing with the output layer here and the activation function is sigmoid.

Prof Ng shows in the lectures and it is given in the notebook that:

dA^{[L]} = - \left (\displaystyle \frac {Y}{A^{[L]}} - \frac {(1 - Y)}{(1 - A^{[L]})} \right )

Now substitute that in your second formula and remember that because of the aforementioned sigmoid, we have:

g^{[L]'}(Z^{[L]}) = A^{[L]} (1 - A^{[L]})

So you can start from the fully general formula that we use in the hidden layers (as Phuc has shown) or you can use the special simplifications that you get because of the specifics of the output layer.

paulinpaloalto · October 26, 2022, 5:25am

Actually there’s also a great thread that Eddy and Mubsi created quite a while ago that goes through a lot of these derivations. In case you haven’t seen it, it’s definitely worth a look.

Topic		Replies	Views
I don't know the difference between dZL = AL - Y and dZL = dAL .* g'(ZL) Neural Networks and Deep Learning coursera-platform	2	790	February 8, 2022
Sigmoid Function in Layer L Neural Networks and Deep Learning coursera-platform	8	721	January 30, 2023
Week 3: Why dZ^[1] = W^[2]T dZ^[2] * g^[1]'(Z^[1]) Neural Networks and Deep Learning coursera-platform	3	903	February 13, 2023
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning coursera-platform	4	789	May 27, 2023
Assignment Building NN C1 Week 4 Neural Networks and Deep Learning coursera-platform	11	621	August 16, 2022

Confusion about Calculating dZ^[l]

Related topics