“Unlike other activation functions, the softmax works across all the outputs.”
What does this statement mean?
It means that all of the outputs are scaled so that their sum is exactly 1 for each example.
Hi @Syamantak_Paliwal ! Su here!
To put your question in simple terms:
Softmax → Gives Probability of each possible class for output of a neural network
Other functions (eg. sigmoid/ReLu) → Only gives a number for each output