In week 4 quiz, there’s a question like attached screenshot.

I’d argue `dZL = AL - Y`

is only true when the activation function is sigmoid function.

Since the question doesn’t specify the activation function, the answer to the question should be false.

Make sense?

The activation function choice does not affect how the Chain Rule is applied, but the derivative of the activation function is one of the factors that is part of the Chain Rule product, right? The point here is that dZ^{[L]} is the derivative of the vector loss w.r.t. the linear activation input to the last layer. In other words:

dZ^{[L]} = \displaystyle \frac {\partial \mathbb{L}}{\partial Z^{[L]}}

When you apply the Chain Rule to the RHS of that expression, you’ll need to calculate the derivative of the loss w.r.t. the output of the activation and the derivative of the activation w.r.t. its input, right? So it will matter that g^{[L]} is the sigmoid function and that the loss is cross entropy loss. This is the derivation shown on this famous thread from Eddy.

But up to this point in the course where the quiz is positioned, we’ve literally never seen a case in which the output activation is anything other than sigmoid and the loss is anything other than cross entropy loss, right? So maybe @WinniePooh is correct that this answer is not True in every possible case, but it’s True as far as we know at this point.

The choice of the activation function definitely affects the way derivatives are calculated in the back propagation.

By definition,

dZL = dAL * gL´(ZL)

When the activation function is sigmoid function, using the property that the derivative of sigmoid = sigmoid * (1- sigmoid), you can calculate dZL by the definition dAL * gL´(ZL) and eventually get AL - Y.

I think this question doesn´t provide the assumption that the activation function is sigmoid.

Of course you are welcome to your opinion, but I disagree. That is what I meant by this part of my previous response: