How does Softmax works

Hi there,

For multi classification problems using 'Neural Networks with Softmax", how do I interpret the the output from final layer?

Let’s say there are 10 classes in outcome variable and size of training set is 1000 rows. In final layer, do we predict probability (Y= k | X) for all observations in training set. If so an observation will have 10 probability values, 1 for each class. I am unable to understand how do I make sense of final layer output.


1 Like

For each example, the label that has the highest value will be used as the best prediction.

1 Like

Hello @Uma_Savili,

Just a reminder that after you apply softmax, for any sample, the sum of the probabilities of all k classes is equal to 1.


1 Like