shouldnt it be g(z1) / g(z1) + g(z2)+ g(z3) and so on why is it only e^z in softmax

1 Like

‘e’ in softmax is natural exponential function that transforms z values into probabilities. I think you misinterpreted the formula.

Please watch the lecture video again.

g(z1) is itself a softmax function of z1, which is equal to \frac {e^{z_1}} { e^{z_1} + .. +e^{z_2}} .

thaat is what im asking in logistic regression we have 1/1+e^Z as the probability now we have this e^z/all e^z why is that

The first one is sigmoid (for binary classification) and the second one is softmax (for multiclassification).

You could try a crazy experiment:

- f(i) = 1/(1+e^Z(I))
- you have to normalize them as follows p(i = f(i)/Sum( f(j) over all j)

I don’t know if it will work, but it could be a different softmax.