I don’t understand what does this part want to say? Maybe I missed some knowledge before. Pls let me know, thanks in advance

Probability is defined as P(x) \in [0,1].

But, output from the last neuron, called “logits”, may be like 5,10, … So, if you want to have probabilities, then, apply Softmax to normalize those.

A numerical stability concern is about exponential which is easily overflowed, and used by Softmax. That is well explained in the video that vignesh18 pointed.

Hope this helps.

1 Like