Post Activation Gradient – Week 4

Matt_Gerhold · April 11, 2022, 2:42pm

In week three, back-prop starts with dz[2] (derivative of the cost with respective to z): dz[2] = a[2] – y, the loss. Prof Ng says that to compute dz[2], one should compute da[2] (post-activation gradient) and then compute dz[2] with da[2]; however, he says that it is equivalent to simply compute dz as dz[2] = a[2] – y. In week four, the back-prop is initialised by computing da[L] and then da[L] is used to compute dz[L].

What I want to clarify, is why not just use dz[L] = a[L] – y, as in week three. Why is it necessary in week 3 to use the loss, then change to using the “post activation gradient”. There may be some equivalence between the two that I missed…

Please can you clarify.

paulinpaloalto · April 11, 2022, 2:48pm

Hi, Matt.

Perhaps I am simply missing your point, but the formula for dZ^{[L]} is the same:

dZ^{[L]} = A^{[L]} - Y

All this is just a big application of the Chain Rule. At the output layer you have the extra steps of the Loss (vector function) followed by the Cost (scalar average of the loss values). You just have to keep track of what the “numerator” is in Prof Ng’s notation. In this case:

dZ^{[L]} = \displaystyle \frac {\partial L}{\partial Z}

The notation is slightly ambiguous since (e.g.):

dW^{[L]} = \displaystyle \frac {\partial J}{\partial W^{[L]}}

Note the J versus L there. Have you seen Eddy’s thread deriving all this?

Matt_Gerhold · April 22, 2022, 2:00pm

Thanks for this! I read the thread and also stepped through the code outside of the jupyter environment. I cleared up my understanding of the notation. I see what’s going on now: they are two different initializations for backprop in the source-code. They are both correct in terms of the chain-rule.

Topic		Replies	Views
Typo in back prop formula (week3 and week 4) Neural Networks and Deep Learning coursera-platform	7	750	December 10, 2021
Backpropagation week 3 vs week 4 Neural Networks and Deep Learning coursera-platform	4	558	August 5, 2022
Gradience Descent Backpropagatin Calculation Neural Networks and Deep Learning coursera-platform	1	626	January 24, 2022
Backprop derivative after output later - Course 1: Week 3 Neural Networks and Deep Learning coursera-platform	4	606	June 22, 2024
Course 1 - Week 4 - 1/m in backpropagation Neural Networks and Deep Learning coursera-platform	12	668	April 29, 2024

Post Activation Gradient – Week 4

Related topics