W4_QUIZ_dZL = AL - Y

The activation function choice does not affect how the Chain Rule is applied, but the derivative of the activation function is one of the factors that is part of the Chain Rule product, right? The point here is that dZ^{[L]} is the derivative of the vector loss w.r.t. the linear activation input to the last layer. In other words:

dZ^{[L]} = \displaystyle \frac {\partial \mathbb{L}}{\partial Z^{[L]}}

When you apply the Chain Rule to the RHS of that expression, you’ll need to calculate the derivative of the loss w.r.t. the output of the activation and the derivative of the activation w.r.t. its input, right? So it will matter that g^{[L]} is the sigmoid function and that the loss is cross entropy loss. This is the derivation shown on this famous thread from Eddy.

But up to this point in the course where the quiz is positioned, we’ve literally never seen a case in which the output activation is anything other than sigmoid and the loss is anything other than cross entropy loss, right? So maybe @WinniePooh is correct that this answer is not True in every possible case, but it’s True as far as we know at this point.

1 Like