W4_QUIZ_dZL = AL - Y

paulinpaloalto · January 15, 2023, 11:40pm

The activation function choice does not affect how the Chain Rule is applied, but the derivative of the activation function is one of the factors that is part of the Chain Rule product, right? The point here is that dZ^{[L]} is the derivative of the vector loss w.r.t. the linear activation input to the last layer. In other words:

dZ^{[L]} = \displaystyle \frac {\partial \mathbb{L}}{\partial Z^{[L]}}

When you apply the Chain Rule to the RHS of that expression, you’ll need to calculate the derivative of the loss w.r.t. the output of the activation and the derivative of the activation w.r.t. its input, right? So it will matter that g^{[L]} is the sigmoid function and that the loss is cross entropy loss. This is the derivation shown on this famous thread from Eddy.

But up to this point in the course where the quiz is positioned, we’ve literally never seen a case in which the output activation is anything other than sigmoid and the loss is anything other than cross entropy loss, right? So maybe @WinniePooh is correct that this answer is not True in every possible case, but it’s True as far as we know at this point.

Topic		Replies	Views
I don't know the difference between dZL = AL - Y and dZL = dAL .* g'(ZL) Neural Networks and Deep Learning	2	781	February 8, 2022
W 4 \| Quiz \| Error in Q.7 or am I just not thinking it straight? Neural Networks and Deep Learning	3	996	October 22, 2022
What is the role of ReLu derivative? Neural Networks and Deep Learning week-3	3	272	May 4, 2024
W2_A1_Video Lecture on Derivatives Neural Networks and Deep Learning	2	267	December 20, 2023
Confusion about Calculating dZ^[l] Neural Networks and Deep Learning	3	807	October 26, 2022

W4_QUIZ_dZL = AL - Y

Related topics