I don’t understand what does this part want to say? Maybe I missed some knowledge before. Pls let me know, thanks in advance
@Nhat_Minh The Improved implementation of softmax video should help
1 Like
Probability is defined as P(x) \in [0,1].
But, output from the last neuron, called “logits”, may be like 5,10, … So, if you want to have probabilities, then, apply Softmax to normalize those.
A numerical stability concern is about exponential which is easily overflowed, and used by Softmax. That is well explained in the video that vignesh18 pointed.
Hope this helps.
1 Like
