I decided to train the chest X-ray model from the 1st assignment of course 1. When running predictions, I noticed that the scores (values for p) when summed together exceed 1. I know that this is because the activation function for the final output layer is the sigmoid function.
I’d love to have a scenario where I can output the model’s probability score as a percentage (this means the total scores should sum up to 1). I know that I can replace the sigmoid function with softmax but I’d like to know why the sigmoid activation function was used here and if it has any advantages over the softmax function in the context of the DenseNet model.
Sigmoid is great if you only care about the output with the highest value (the most likely label).
Hi @Akoji_Timothy ,
I understand your question and the answer is that Sigmoid function will provide us the classification values i.e. it will give you predicted values whereas the softmax function gives you the probability of the labels.
If you have any further question please let me know.
I notice that the previous reply of @TMosh above uses the word value singular, while the reply directly above of @harshder03 uses the word values plural for the sigmoid function output. Any thoughts or clarification we can provide for the OP?
Can we also tie it to the functional objective of the overall program? I don’t have the code in front of me, but maybe it is designed to make a diagnostic prediction on a single medical condition or problem?
The reason why sigmoid was used in this assignment is for binary classification methods as we only have 2 classes here. The more precise answer for your query that you want to add model probability score as a percentage, is the reason why sigmoid function is the right choice. A weighted sum of inputs is passed through an activation function and this output serves as an input to the next layer. When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be between 0 and 1. Sigmoid function introduces non-linearity into the model, which allows the neural network to learn more complex decision boundaries(Decision boundaries).
why softmax was not used
softmax is used as the activation function for multi-class classification problems where class members are more than two class labels.
As you must have seen the assignment is building a model based on chest x-ray to give a binary classification for each of the 14 labelled pathologies. See the screenshot from the same assignment mentioning about the dataset.
Hope this clarifies your doubt
Thanks @Deepti_Prasad for the additional background and insight. @Akoji_Timothy , My takeaway is that picking an activation function and interpreting output is actually rather nuanced for this medical decision making. 14 pathological conditions, 8 diseases, maybe hierarchical or otherwise not mutually exclusive (which softmax assumes). I think algorithm design comes down to exactly which question(s) you are trying to answer and how much you can/choose to build in to the network itself versus post-processing on the network output.
actually it is more about statistical analysis in medical related models as in most of medical conditions it is never x = y but x = y with cofactors such as habits, age, medical history, genetic condition, gender, etc