Softmax output layer vs k sigmoid units in output layer

sandeep_kumar13 · April 15, 2023, 1:11pm

why can’t we just take k sigmoid unit in output layer for k class classification and predict the output as max of the output of those 10 units.
In this case we can train our neural network by taking y=[1,0,0,0,0,0,0,0,0,0] for y=1 and y=[0,0,0,0,1,0,0,0,0,0] for y=5 and etc. for 10 class classification.

Does softmax layer gives better results as compared to taking K output sigmoid units for k classification.

Mujassim_Jamal · April 15, 2023, 2:27pm

Hi @sandeep_kumar13 ,

You could use k sigmoids instead of softmax but it can be less efficient in many cases. Firstly, softmax produces probabilities that sum up to 1 across all classes, while k sigmoids do not. K sigmoids generate k independent probabilities, making it harder to interpret the results. Secondly, softmax is computationally efficient as it calculates the exponents once and shares them across all classes. The softmax function has a smooth and convex loss function, which is easier to optimize compared to k sigmoids, which can have many local minima.

Thirdly, softmax involves fewer parameters and can be optimized more efficiently. In contrast, each sigmoid in k sigmoids has its own set of parameters, which is not ideal when using regularizations.

Overall, while k sigmoids can be an option for k-class classification, softmax is often preferred due to its efficient computation, smooth and convex loss function, fewer parameters, and easier optimization and many practitioners prefer to use softmax for its superior performance and ease of implementation.

Best,
Mujassim

sandeep_kumar13 · April 15, 2023, 5:17pm

Hi @Mujassim_Jamal
Thank you very much for such a wonderful explanation.

Topic		Replies	Views
Week3 - Does Softmax outperform the Sigmoid for multiple classes? Improving Deep Neural Networks: Hyperparameter tun	3	551	December 8, 2021
Sigmoid vs Softmax Convolutional Neural Networks in TensorFlow week-1	2	584	May 4, 2022
Why softmax is used, if we can do same thing with the sigmoid function? Advanced Learning Algorithms week-2	14	1119	February 9, 2023
Why Sigmoid is used instead of softmax? Advanced Computer Vision with TensorFlow week-3	2	539	March 6, 2023
Why softmax in last layer for multiclass NN? Improving Deep Neural Networks: Hyperparameter tun	5	563	January 7, 2022

Softmax output layer vs k sigmoid units in output layer

Related topics