Where is the activation function in Week 2 - Transfer Learning assignment

Marios_Constantinou · July 9, 2022, 2:59pm

Why don’t we add a sigmoid activation function at our prediction layer?
Also, why do we have only one neuron and not two since we have two classes?

I am referencing our notebook for my own project, I have 3 classes in my dataset and this is my code:

I am trying to use ResNet50

Sorry if this question sounds silly and if the answer is obvious

balaji.ambresh · July 9, 2022, 3:12pm

Please look at the comment about the dense layer with 1 unit. It says right there that for a binary classfication problem 1 unit is sufficient.

The loss function for the model has a from_logits flag set to True. So the loss is computed after converting the output of a dense layer to probability scale. See here for the documentation.

For a multi class prediction (classes > 2), the output layer has to have number of units equal to number of classes.

paulinpaloalto · July 9, 2022, 11:59pm

Note that the point Balaji makes about from_logits = True mode for the loss function also applies in the cases in which we have a multiclass output. The way you have written the code with the explicit softmax activation is the other way to do it. Your method is also correct, but the method of bundling the activation with the loss calculation is preferred because it is more numerically stable.

Marios_Constantinou · July 10, 2022, 7:24am

So instead of sotftmax and categorical cross entropy as a loss function, I could have used categorical cross entropy only without softmax?

balaji.ambresh · July 10, 2022, 7:53am

If you don’t use softmax in the output layer, specify from_logits=True in the loss function. See this link.

Marios_Constantinou · July 10, 2022, 7:57am

Ohhhhh okay, If Softmax is used, then the y_pred is turned into a probability distribution, if Softmax is not used then y_pred is logits tensor, right? This is interesting, am going to test it right now! Thank you! @balaji.ambresh

balaji.ambresh · July 10, 2022, 8:20am

That is correct. Here is more on logit and its relationship to probability scale.

Topic		Replies	Views
Week 2, prog_assgn, Ex-2 Convolutional Neural Networks coursera-platform	5	531	October 25, 2021
Why "use a prediction layer with one neuron (as a binary classifier only needs one)"? Convolutional Neural Networks coursera-platform	2	1052	November 27, 2022
C4W2 activation in output layer Convolutional Neural Networks coursera-platform	1	515	August 19, 2021
Improved implementation of softmax - Neural network training \| Coursera Advanced Learning Algorithms week-module-2	1	68	June 25, 2024
Why in this lecture slide we are putting vector Z in to tf.nn.sigmoid when we used softmax? Advanced Learning Algorithms week-module-2	3	526	October 21, 2022

Where is the activation function in Week 2 - Transfer Learning assignment

Related topics