Including the sigmoid activation in the final layer is not considered best practice. It would instead be accounted for in the loss which improves numerical stability. This will be described in more detail in a later lab

Utsav_Sharma1 · January 20, 2023, 6:45am

this was mentioned in the coffee roasting lab using tensorflow. i don’t get why it is so because we itself are using the sigmoid activation function in the last layer

rmwkwok · January 20, 2023, 7:23am

Hello @Utsav_Sharma1,

Indeed, in “C2_W1_Lab02_CoffeeRoasting_TF”, we are using sigmoid in the output layer. I think we are using it as a continuation of what we have been learning about logistic regression - that there is a sigmoid.

That line that you have quoted is just a “spoiler” of what you are going to learn from this video. When you get to that one, you will hear why sometimes we prefer not to use sigmoid in the output layer, and how we instead account for the sigmoid without specifying it in the output layer.

Cheers,
Raymond

paulinpaloalto · January 20, 2023, 5:41pm

I think the point being made there is that we are using sigmoid, but the point is we don’t have to code that directly: we let the loss function do it for us by using the from_logits parameter. Doing it that way is a) less code to write and b) gives better (more numerically stable) results, so what is not to like about that?

Topic		Replies	Views
Choosing Activation functions Advanced Learning Algorithms week-1	1	464	February 17, 2023
Correct way to create a logistic regression NN Advanced Learning Algorithms week-2	5	539	September 17, 2023
What if the last layer is not sigmoid? AI Discussions	4	76	December 2, 2023
Assignment 2 - Dense Layer Activation Convolutional Neural Networks week-2 , coursera-platform	2	310	January 19, 2024
Activation Function for Last Layer - Lab Assignment: Neural Networks for Binary Classification Advanced Learning Algorithms week-1	2	515	August 1, 2023

Including the sigmoid activation in the final layer is not considered best practice. It would instead be accounted for in the loss which improves numerical stability. This will be described in more detail in a later lab

Related topics