for a multiclassification problem we calculate the probabilities using softmax as follow e^z(i)/e^z(1)+…+e^z(n)
my question is can we use this formula instead z(i)/z(1)+…+z(n) , we can see that the outputs will be between 0 and 1 and also will sum up to 1 . so i wonder if using the exponential is just a convention and we can use the formula z(i)/z(1)+…+z(n) or there is a reason for why we don’t use it
Note that the z_i values are the output of a linear transformation, so all you can say about the value is:
-\infty < z_i < +\infty
Since some of them may be positive and some negative, that implies that the sum in the denominator of your version of the expression could possibly evaluate to 0. That would not end well.