Purpose of using numerically accurate implementation of softmax

Yup, @Bio_J, some of the very small round off error, and, when -z becomes large, the overflow problem in e^{-z}.

I said some of the very small round off error, because, as I explained in this post, the “numerically accurate implementation” gives us a mathematical simplification from

image

to

image

And we still have one exponential term left. :wink:

Cheers,
Raymond

1 Like