Softmax layer at last layer

LimXiuXian96 · August 4, 2021, 11:33am

Wish to ask that for multi class classification problem, usu we will put softmax as the last layer with eqn of e^a/sum(e^a).

My questions are what are the advantages for softmax and does eqn like a^2/sum(a^2) will most likely to work also? Here a stand for output from last layer activation.

paulinpaloalto · April 15, 2022, 7:46pm

Yes, softmax is the preferred activation function for the output layer of a network that is doing “multiclass” classification, that is to say classification in which there are multiple possible answers, not just “yes/no”. What softmax does is convert the output values to something that can be thought of as the probability of each of the possible answers for a given input sample. It turns out that you can think of softmax as the multiclass generalization of sigmoid and the “cross entropy” loss function also works for softmax. The mathematical behavior of the losses and gradients are the same in both cases.

Prof Ng will cover softmax in Course 2 of this series, so please stay tuned for that.

I am not familiar with the other function you suggest. You can try some experiments using that and see how it works. Of course you’ll need to pick a loss function as well, but since the values are between 0 and 1, you could try “cross entropy” loss for that.

Topic		Replies	Views
Why softmax in last layer for multiclass NN? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	586	January 7, 2022
Softmax layer intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	554	August 6, 2021
C2_W3_multiclassification Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	533	September 5, 2022
Why use Softmax instead of a linear transform that sums to 1? Neural Networks and Deep Learning coursera-platform	5	906	May 28, 2021
Why softmax is used Neural Networks and Deep Learning coursera-platform	3	600	August 6, 2021

Softmax layer at last layer

Related topics