Week3 - Does Softmax outperform the Sigmoid for multiple classes?

G11 · December 8, 2021, 8:37am

Hi,

In one of Prof. Ng previous courses (Machine Learning by Standford), the Sigmoid is used in a Deep NN for classifying hand written numbers 0-9, i.e. the Sigmoid is used for a multiple class case. In this current course, I understand that we use Softmax for multiple classes. Does Softmax outperform the Sigmoid for multiple classes?

Thanks.

paulinpaloalto · December 8, 2021, 4:40pm

How would you apply sigmoid to a multi-class classification? Because it only gives yes/no answers, right? If you recall how Prof Ng did that in the original Machine Learning course, it was to use the “one vs all” approach. So if you have 10 classes, as in your example, what you do is run the training literally 10 times: once for 0 vs all the others, once for 1 vs all the others, and so forth and thus you end up with 10 separate models. To predict the value of a given input, you run all 10 models and then select the class for which the corresponding model has the highest output.

I have not tried a real experiment to compare the results of sigmoid with “one vs all” versus softmax on a particular problem, so I don’t know whether there is a performance difference in the accuracy of the resulting models. But the one thing we can say for sure is that the cost of training the model is significantly higher in the “one vs all” case: we have to run the complete training 10 times (or whatever the number of classes is) versus once. Of course there maybe more subtleties there (e,g, maybe you need fewer iterations in each case for “one vs all” but it could just as easily be more iterations), but the overall point is that “one vs all” sounds a lot more expensive. Once you have softmax and understand how to use it, it makes everything a lot more straightforward.

G11 · December 8, 2021, 7:23pm

@paulinpaloalto I really appreciate your answer, thank you so much clearing that out!

/G

paulinpaloalto · December 8, 2021, 7:32pm

One more point worth mentioning on this is that the mathematics of softmax and sigmoid are very closely related. You’ll notice that the derivative and the loss function are the same. You can think of softmax as the multiclass generalization of sigmoid.

Topic		Replies	Views
Sigmoid vs Softmax Convolutional Neural Networks in TensorFlow week-1	2	584	May 4, 2022
Softmax output layer vs k sigmoid units in output layer Advanced Learning Algorithms week-2	2	479	April 15, 2023
Sigmoid for multi-label not multi-class Advanced Learning Algorithms week-2	2	405	July 22, 2023
Why softmax is used, if we can do same thing with the sigmoid function? Advanced Learning Algorithms week-2	14	1135	February 9, 2023
C2_W3_multiclassification Improving Deep Neural Networks: Hyperparameter tun	3	516	September 5, 2022

Week3 - Does Softmax outperform the Sigmoid for multiple classes?

Related topics