Softmax vs normal probability calculations


Why do we use softmax? and why we cannot use like z1/(z1+z2+z3).
so what is the purpose of exponentials here?

1 Like

Hi @Pirzada

To answer in simple words, I would say that softmax accounts for “confidence”:

For example, with not so “confident” values:
softmax([1, 2, 3]) # not so confident
results in not so “confident” probabilities:
[0.09, 0.24, 0.67]

in contrast (z1/(z1+z2+z3)) result:
[0.17, 0.33, 0.5]

Now with different values:
softmax([10, 20, 30]) # very confident result:
[0.00000, 0.00005, 0.99995]

in contrast (z1/(z1+z2+z3)) result:
[0.17, 0.33, 0.5]

To add:
There are other ways to account for “confidence” but softmax “works well” with cross-entropy. If you would want to dive deeper, Chapter 6 addresses this question in Ian Goodfellow and Yoshua Bengio and Aaron Courville free book

I found this article, I hope it helps: Softmax Activation Function with Python.