W4_QUIZ_dZL = AL - Y

WinniePooh · January 15, 2023, 6:52pm

In week 4 quiz, there’s a question like attached screenshot.
I’d argue dZL = AL - Y is only true when the activation function is sigmoid function.
Since the question doesn’t specify the activation function, the answer to the question should be false.
Make sense?

paulinpaloalto · January 15, 2023, 11:40pm

The activation function choice does not affect how the Chain Rule is applied, but the derivative of the activation function is one of the factors that is part of the Chain Rule product, right? The point here is that dZ^{[L]} is the derivative of the vector loss w.r.t. the linear activation input to the last layer. In other words:

dZ^{[L]} = \displaystyle \frac {\partial \mathbb{L}}{\partial Z^{[L]}}

When you apply the Chain Rule to the RHS of that expression, you’ll need to calculate the derivative of the loss w.r.t. the output of the activation and the derivative of the activation w.r.t. its input, right? So it will matter that g^{[L]} is the sigmoid function and that the loss is cross entropy loss. This is the derivation shown on this famous thread from Eddy.

But up to this point in the course where the quiz is positioned, we’ve literally never seen a case in which the output activation is anything other than sigmoid and the loss is anything other than cross entropy loss, right? So maybe @WinniePooh is correct that this answer is not True in every possible case, but it’s True as far as we know at this point.

WinniePooh · January 15, 2023, 11:43pm

The choice of the activation function definitely affects the way derivatives are calculated in the back propagation.

By definition,
dZL = dAL * gL´(ZL)

When the activation function is sigmoid function, using the property that the derivative of sigmoid = sigmoid * (1- sigmoid), you can calculate dZL by the definition dAL * gL´(ZL) and eventually get AL - Y.

I think this question doesn´t provide the assumption that the activation function is sigmoid.

Juan_Olano · January 15, 2023, 11:45pm

As explained by @paulinpaloalto, and expanded by you in your last comment, your point is valid

paulinpaloalto · January 26, 2023, 1:12am

Of course you are welcome to your opinion, but I disagree. That is what I meant by this part of my previous response:

Topic		Replies	Views
I don't know the difference between dZL = AL - Y and dZL = dAL .* g'(ZL) Neural Networks and Deep Learning coursera-platform	2	790	February 8, 2022
Doubt in a question from Key Concepts on Deep Neural Networks Neural Networks and Deep Learning week-module-4 , coursera-platform	3	16	April 25, 2025
W 4 \| Quiz \| Error in Q.7 or am I just not thinking it straight? Neural Networks and Deep Learning coursera-platform	3	1010	October 22, 2022
What is the role of ReLu derivative? Neural Networks and Deep Learning week-module-3 , coursera-platform	3	291	May 4, 2024
W2_A1_Video Lecture on Derivatives Neural Networks and Deep Learning coursera-platform	2	267	December 20, 2023

W4_QUIZ_dZL = AL - Y

Related topics