Sigmoid Function in Layer L


Thanks for reading my message.
My question is related to the implementation of linear_activation_backward function.

In this assignment, we have multiple ReLU activation functions in the hidden layers and one sigmoid function in the output layer.

There is a function provided for the backward propagation for the sigmoid function. Knowing that the sigmoid function in this case is used at the output layer (L), DZ[L] would be A[L] - Y which is the derivative of Z being used at Layer L.

What is sigmoid_backward function calculating then?
and why does it need dA as a parameter and the activation_cache (which is Z in this case)?


Hello @nfattal,

Note that Z and A are vectors and all multiplications are element-wise.

Therefore, we need Z to compute g(Z), and we need dA to compute dZ.


Dear NFattal,

There’s a similar query in this link. It would justify most of your doubts.

Thanks Raymond and Rashmi for the prompt reply.
I will review the answer provided and comment back if needed.


Just a follow on to that, I understand the equation, but at layer L (the output layer), DZ = A[L] - Y.
It is not the equation you kindly pasted.

That is what we used also in week’s 3 assignment and the course material.
DZ[L]=A[L]-Y when calculating the derivative at the output layer.


My deviation was explaining these questions:

The point is that sigmoid_backward and relu_backward are calculating the formula that Raymond shows. Remember that these functions are intended to be general and it’s perfectly possible to use sigmoid in hidden layers as well, although it just happens we don’t do that.

The formula you show of AL - Y is a special case: that only applies at the output layer and it happens because they have already included the derivative of sigmoid. The activation is only sigmoid at the output layer in general. See the derivation of that on the famous thread from Eddy.

@paulinpaloalto, @rmwkwok
Thank you gentlemen for the time and effort you put in replying to the queries.



Just thought of mentioning that it may be clearer and may result in fewer questions on this topic if we show the derivation in the class notes posted in the course material.

Please see the derivation added to the course notes (attached file) which is basically the same as the link provided by @paulinpaloalto.

It may be better to update the class notes and show the derivation there.
Please see attached.

My two cents…

Week 4 - Backward Propagation Formulas for Deep Learning Networks.pdf (495.0 KB)

Thanks again…