Including the sigmoid activation in the final layer is not considered best practice. It would instead be accounted for in the loss which improves numerical stability. This will be described in more detail in a later lab

this was mentioned in the coffee roasting lab using tensorflow. i don’t get why it is so because we itself are using the sigmoid activation function in the last layer


Hello @Utsav_Sharma1,

Indeed, in “C2_W1_Lab02_CoffeeRoasting_TF”, we are using sigmoid in the output layer. I think we are using it as a continuation of what we have been learning about logistic regression - that there is a sigmoid.

That line that you have quoted is just a “spoiler” of what you are going to learn from this video. When you get to that one, you will hear why sometimes we prefer not to use sigmoid in the output layer, and how we instead account for the sigmoid without specifying it in the output layer.


1 Like

I think the point being made there is that we are using sigmoid, but the point is we don’t have to code that directly: we let the loss function do it for us by using the from_logits parameter. Doing it that way is a) less code to write and b) gives better (more numerically stable) results, so what is not to like about that? :smiley:

1 Like