shouldnt it be g(z1) / g(z1) + g(z2)+ g(z3) and so on why is it only e^z in softmax
1 Like
‘e’ in softmax is natural exponential function that transforms z values into probabilities. I think you misinterpreted the formula.
Please watch the lecture video again.
g(z1) is itself a softmax function of z1, which is equal to \frac {e^{z_1}} { e^{z_1} + .. +e^{z_2}} .
thaat is what im asking in logistic regression we have 1/1+e^Z as the probability now we have this e^z/all e^z why is that
The first one is sigmoid (for binary classification) and the second one is softmax (for multiclassification).
You could try a crazy experiment:
- f(i) = 1/(1+e^Z(I))
- you have to normalize them as follows p(i = f(i)/Sum( f(j) over all j)
I don’t know if it will work, but it could be a different softmax.