Because attention is a weighted average and the attention weights should therefore sum to one. Softmax ensures that.
1 Like
Because attention is a weighted average and the attention weights should therefore sum to one. Softmax ensures that.