Hello Mentors,
I have a question regarding the derivation of the softmax function from the sigmoid.
The softmax function is a generalized form of the sigmoid function. I was curious to know the derivation.
I have also checked Raymond’s explanation in the topic below -
In video Professor Andrew Ng said, The softmax regression algorithm is a generalization of logistic regression.
How can the logistic regression model with sigmoid function be derived from Softmax regression model?
Thanks and best Regards
Liyu
I wondered how sigmoid(z1-z2) is equivalent to sigmoid(z1)?
It would be great if someone could explain the derivation of the softmax function from the Sigmoid function.
Thanks
Tamal
My response here:
Hello, @tamalmallick ,
Let’s consider two neural networks sharing the same base network but different heads which are, respectively, softmax layer and sigmoid layer.
[image]
My arguments of sigmoid(z_1-z_2) and sigmoid(z) being equivalent are that -
z_1 is a linear combination of a_1, a_2, ..., a_n, and so is z_2
\implies z_1 - z_2 is a linear combination of a_1, a_2, ..., a_n
Because z is also a linear combination of a_1, a_2, ..., a_n, z and z_1 - z_2 are equivalent.
Cheers,
Raymond
Let’s see if it makes sense to everyone
Cheers,
Raymond
TMosh
June 14, 2025, 10:55pm
3
I got lost in the algebra when I tried to write out a proof.
The other thread doesn’t work out the derivation of your formula either.
It would be really informative to see it worked out.
1 Like
Hi @tamalmallick ,
For 2 classes the softmax is identical to the sigmoid:
\begin{align}
{\bf z} = [z, 0] \\
{\rm Softmax}({\bf z})_1 &= {e^z \over e^z + e^0} = {e^z \over e^z + 1} = {1 \over 1 + e^{-z}} = \sigma(z) \\
{\rm Softmax}({\bf z})_2 &= {e^0 \over e^z + e^0} = {1 \over e^z + 1} = 1 - \sigma(z)
\end{align}
Hi @consell ,
Thanks for this, but I have a follow-up question. You mentioned z = [z,0]. I wondered how z becomes 0 in 2 classes when z is a linear expression z=W.X+b?
TMosh
June 17, 2025, 4:10pm
6
That seems to be a specific example where it is assumed that the output of the first unit is ‘z’, and the output of the second unit is zero.
I’d like to see a derivation where z1 and z2 are both any real numbers.
1 Like
@tamalmallick ,
The softmax function is shift invariant for any constant c :
\begin{align}
{\rm Softmax}([z_1, z_2]) &= \left[ {e^{z_1} \over e^{z_1} + e^{z_2}}, {e^{z_2} \over e^{z_1} + e^{z_2}} \right] \\
&= \left[ {e^{z_1} e^{-c} \over e^{-c} (e^{z_1} + e^{z_2})}, {e^{z_2} e^{-c} \over e^{-c} (e^{z_1} + e^{z_2})} \right] \\
&= \left[ {e^{z_1 -c} \over e^{z_1 - c} + e^{z_2 - c}}, {e^{z_2 - c} \over e^{z_1 - c} + e^{z_2 - c}} \right] \\
&= {\rm Softmax}([z_1 - c, z_2 -c]).
\end{align}
(This fact is used to ensure numerical stability by subtracting the maximum logit from all logits)
So, for binary classification, we can shift the logits by setting c = z_2 :
{\rm Softmax}([z_1, z_2]) = {\rm Softmax}([z_1 - z_2, 0]).
Let z = z_1 - z_2 . Then:
\begin{align}
{\rm Softmax}([z_1, z_2]) &= {\rm Softmax}([z, 0]) \\
&= [\sigma(z), 1 - \sigma(z)] \\
&= [\sigma(z_1 - z_2), 1 - \sigma(z_1 - z_2)],
\end{align}
Which is similar to the result presented by Raymond @rmwkwok , which shows that binary softmax is equivalent to sigmoid over the logit difference.
I am not sure what exactly you are trying to derive, because the softmax function converts logits z_i into probability over classes, whereas the sigmoid function maps a single logit to a probability between 0 and 1. In particular, softmax is used when you have multiple mutually exclusive classes, and it ensures that the outputs sum to 1 across all classes. Sigmoid, on the other hand, is used when you’re modeling the probability of a single class or multiple independent classes, and does not enforce that the outputs sum to 1.
TMosh
June 18, 2025, 1:45am
8
“Math processing errors”.
“Math processing errors”.
Please try to reload the page, or try another browser. This error sometimes occurs on my iPhone when connection is slow.
Screenshot
TMosh
June 18, 2025, 2:20am
10
I’m on Firefox on Windows 10. The “math processing error” seems to have resolved itself.