Why Softmax function?

paulinpaloalto · June 8, 2024, 11:28pm

The trick is the from_logits = True argument to the BCE loss function. That means that the softmax is incorporated with the loss function for better stability. That is explained on this thread. But that does mean that you need to manually apply softmax when you make predictions with the final model after the training is complete. But note that softmax is monotonic, so the largest input will produce the largest output. So you can see which class is the prediction even without softmax.

So you actually are using softmax, but not by including it directly in the output layer of your defined network architecture.

Topic		Replies	Views
Practice quiz: Multiclass Classification Advanced Learning Algorithms week-module-2	1	539	June 18, 2022
Week 2 - Improved implementation with SoftMax Advanced Learning Algorithms week-module-2	10	719	December 1, 2023
Why softmax in last layer for multiclass NN? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	567	January 7, 2022
Multiclass Lab: How is softmax implied in the loss function? Advanced Learning Algorithms week-module-2	1	453	June 4, 2023
Why softmax is used Neural Networks and Deep Learning coursera-platform	3	577	August 6, 2021

Why Softmax function?

Related topics