Sigmoid Function in Layer L

nfattal · January 26, 2023, 7:59am

Hi,

Thanks for reading my message.
My question is related to the implementation of linear_activation_backward function.

In this assignment, we have multiple ReLU activation functions in the hidden layers and one sigmoid function in the output layer.

There is a function provided for the backward propagation for the sigmoid function. Knowing that the sigmoid function in this case is used at the output layer (L), DZ[L] would be A[L] - Y which is the derivative of Z being used at Layer L.

What is sigmoid_backward function calculating then?
and why does it need dA as a parameter and the activation_cache (which is Z in this case)?

Thanks
Nazih

rmwkwok · January 26, 2023, 8:52am

Hello @nfattal,

Note that Z and A are vectors and all multiplications are element-wise.

Therefore, we need Z to compute g(Z), and we need dA to compute dZ.

Cheers,
Raymond

Rashmi · January 26, 2023, 8:54am

Dear NFattal,

There’s a similar query in this link. It would justify most of your doubts.

nfattal · January 26, 2023, 10:48am

Thanks Raymond and Rashmi for the prompt reply.
I will review the answer provided and comment back if needed.

Thanks,

nfattal · January 26, 2023, 11:36am

Just a follow on to that, I understand the equation, but at layer L (the output layer), DZ = A[L] - Y.
It is not the equation you kindly pasted.

That is what we used also in week’s 3 assignment and the course material.
DZ[L]=A[L]-Y when calculating the derivative at the output layer.

Thanks,

rmwkwok · January 26, 2023, 1:21pm

My deviation was explaining these questions:

paulinpaloalto · January 26, 2023, 4:04pm

The point is that sigmoid_backward and relu_backward are calculating the formula that Raymond shows. Remember that these functions are intended to be general and it’s perfectly possible to use sigmoid in hidden layers as well, although it just happens we don’t do that.

The formula you show of AL - Y is a special case: that only applies at the output layer and it happens because they have already included the derivative of sigmoid. The activation is only sigmoid at the output layer in general. See the derivation of that on the famous thread from Eddy.

nfattal · January 30, 2023, 7:17am

@paulinpaloalto, @rmwkwok
Thank you gentlemen for the time and effort you put in replying to the queries.

Understood.

nfattal · January 30, 2023, 12:33pm

Gentlemen,

Just thought of mentioning that it may be clearer and may result in fewer questions on this topic if we show the derivation in the class notes posted in the course material.

Please see the derivation added to the course notes (attached file) which is basically the same as the link provided by @paulinpaloalto.

It may be better to update the class notes and show the derivation there.
Please see attached.

My two cents…

Week 4 - Backward Propagation Formulas for Deep Learning Networks.pdf (495.0 KB)
file…

Thanks again…

Topic		Replies	Views
Course 1 week 4, assignment 1, exercise 8: linear activation backward Neural Networks and Deep Learning	4	648	February 11, 2022
W4 - Shouldn't the activation A be also cached? Neural Networks and Deep Learning week-1	6	30	November 7, 2024
dZ for sigmoid in linear_activation_backward Neural Networks and Deep Learning	6	737	October 27, 2022
Confusion about Calculating dZ^[l] Neural Networks and Deep Learning	3	808	October 26, 2022
W 4 \| Quiz \| Error in Q.7 or am I just not thinking it straight? Neural Networks and Deep Learning	3	1004	October 22, 2022

Sigmoid Function in Layer L

Related topics