Softmax as activation function for output layer of DenseNet121 model

Akoji_Timothy · July 27, 2023, 6:05pm

I decided to train the chest X-ray model from the 1st assignment of course 1. When running predictions, I noticed that the scores (values for p) when summed together exceed 1. I know that this is because the activation function for the final output layer is the sigmoid function.

I’d love to have a scenario where I can output the model’s probability score as a percentage (this means the total scores should sum up to 1). I know that I can replace the sigmoid function with softmax but I’d like to know why the sigmoid activation function was used here and if it has any advantages over the softmax function in the context of the DenseNet model.

Thank you.

TMosh · July 28, 2023, 4:09am

Sigmoid is great if you only care about the output with the highest value (the most likely label).

harshder03 · August 13, 2023, 10:38am

Hi @Akoji_Timothy ,

I understand your question and the answer is that Sigmoid function will provide us the classification values i.e. it will give you predicted values whereas the softmax function gives you the probability of the labels.

If you have any further question please let me know.

Thank you.

ai_curious · August 13, 2023, 11:08am

I notice that the previous reply of @TMosh above uses the word value singular, while the reply directly above of @harshder03 uses the word values plural for the sigmoid function output. Any thoughts or clarification we can provide for the OP?

Can we also tie it to the functional objective of the overall program? I don’t have the code in front of me, but maybe it is designed to make a diagnostic prediction on a single medical condition or problem?

Deepti_Prasad · August 13, 2023, 11:48am

Hi Timothy,

The reason why sigmoid was used in this assignment is for binary classification methods as we only have 2 classes here. The more precise answer for your query that you want to add model probability score as a percentage, is the reason why sigmoid function is the right choice. A weighted sum of inputs is passed through an activation function and this output serves as an input to the next layer. When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be between 0 and 1. Sigmoid function introduces non-linearity into the model, which allows the neural network to learn more complex decision boundaries(Decision boundaries).
why softmax was not used
softmax is used as the activation function for multi-class classification problems where class members are more than two class labels.

As you must have seen the assignment is building a model based on chest x-ray to give a binary classification for each of the 14 labelled pathologies. See the screenshot from the same assignment mentioning about the dataset.

Hope this clarifies your doubt

Regards
DP

ai_curious · August 13, 2023, 2:09pm

Thanks @Deepti_Prasad for the additional background and insight. @Akoji_Timothy , My takeaway is that picking an activation function and interpreting output is actually rather nuanced for this medical decision making. 14 pathological conditions, 8 diseases, maybe hierarchical or otherwise not mutually exclusive (which softmax assumes). I think algorithm design comes down to exactly which question(s) you are trying to answer and how much you can/choose to build in to the network itself versus post-processing on the network output.

Deepti_Prasad · August 13, 2023, 2:18pm

actually it is more about statistical analysis in medical related models as in most of medical conditions it is never x = y but x = y with cofactors such as habits, age, medical history, genetic condition, gender, etc

Topic		Replies	Views
First binary classification model Neural Networks and Deep Learning coursera-platform	5	564	July 12, 2022
Sigmoid activation function issues AI Discussions ai-discussions , ai-question	24	1016	May 7, 2024
Week 2 Quiz - Activation Function Structuring Machine Learning Projects coursera-platform	4	492	July 8, 2023
Activation Function for Last Layer - Lab Assignment: Neural Networks for Binary Classification Advanced Learning Algorithms week-1	2	515	August 1, 2023
Which activation function can be used in the output layer for face recognition? Advanced Learning Algorithms week-1	8	443	July 31, 2023

Softmax as activation function for output layer of DenseNet121 model

Related topics