Derivation of softmax function from sigmoid function of logistic regression

tamalmallick · June 13, 2025, 4:01pm

Hello Mentors,

I have a question regarding the derivation of the softmax function from the sigmoid.

The softmax function is a generalized form of the sigmoid function. I was curious to know the derivation.

I have also checked Raymond’s explanation in the topic below -

I wondered how sigmoid(z1-z2) is equivalent to sigmoid(z1)?

It would be great if someone could explain the derivation of the softmax function from the Sigmoid function.

Thanks
Tamal

rmwkwok · June 14, 2025, 1:01am

My response here:

Let’s see if it makes sense to everyone

Cheers,
Raymond

TMosh · June 14, 2025, 10:55pm

I got lost in the algebra when I tried to write out a proof.

The other thread doesn’t work out the derivation of your formula either.

It would be really informative to see it worked out.

conscell · June 15, 2025, 3:59am

Hi @tamalmallick,

For 2 classes the softmax is identical to the sigmoid:

\begin{align} {\bf z} = [z, 0] \\ {\rm Softmax}({\bf z})_1 &= {e^z \over e^z + e^0} = {e^z \over e^z + 1} = {1 \over 1 + e^{-z}} = \sigma(z) \\ {\rm Softmax}({\bf z})_2 &= {e^0 \over e^z + e^0} = {1 \over e^z + 1} = 1 - \sigma(z) \end{align}

tamalmallick · June 17, 2025, 3:44pm

Hi @consell,

Thanks for this, but I have a follow-up question. You mentioned z = [z,0]. I wondered how z becomes 0 in 2 classes when z is a linear expression z=W.X+b?

TMosh · June 17, 2025, 4:10pm

That seems to be a specific example where it is assumed that the output of the first unit is ‘z’, and the output of the second unit is zero.

I’d like to see a derivation where z1 and z2 are both any real numbers.

conscell · June 18, 2025, 12:09am

@tamalmallick,

The softmax function is shift invariant for any constant c:

\begin{align} {\rm Softmax}([z_1, z_2]) &= \left[ {e^{z_1} \over e^{z_1} + e^{z_2}}, {e^{z_2} \over e^{z_1} + e^{z_2}} \right] \\ &= \left[ {e^{z_1} e^{-c} \over e^{-c} (e^{z_1} + e^{z_2})}, {e^{z_2} e^{-c} \over e^{-c} (e^{z_1} + e^{z_2})} \right] \\ &= \left[ {e^{z_1 -c} \over e^{z_1 - c} + e^{z_2 - c}}, {e^{z_2 - c} \over e^{z_1 - c} + e^{z_2 - c}} \right] \\ &= {\rm Softmax}([z_1 - c, z_2 -c]). \end{align}

(This fact is used to ensure numerical stability by subtracting the maximum logit from all logits)
So, for binary classification, we can shift the logits by setting c = z_2:

{\rm Softmax}([z_1, z_2]) = {\rm Softmax}([z_1 - z_2, 0]).

Let z = z_1 - z_2. Then:

\begin{align} {\rm Softmax}([z_1, z_2]) &= {\rm Softmax}([z, 0]) \\ &= [\sigma(z), 1 - \sigma(z)] \\ &= [\sigma(z_1 - z_2), 1 - \sigma(z_1 - z_2)], \end{align}

Which is similar to the result presented by Raymond @rmwkwok, which shows that binary softmax is equivalent to sigmoid over the logit difference.
I am not sure what exactly you are trying to derive, because the softmax function converts logits z_i into probability over classes, whereas the sigmoid function maps a single logit to a probability between 0 and 1. In particular, softmax is used when you have multiple mutually exclusive classes, and it ensures that the outputs sum to 1 across all classes. Sigmoid, on the other hand, is used when you’re modeling the probability of a single class or multiple independent classes, and does not enforce that the outputs sum to 1.

TMosh · June 18, 2025, 1:45am

“Math processing errors”.

conscell · June 18, 2025, 2:00am

“Math processing errors”.

Please try to reload the page, or try another browser. This error sometimes occurs on my iPhone when connection is slow.

Screenshot

TMosh · June 18, 2025, 2:20am

I’m on Firefox on Windows 10. The “math processing error” seems to have resolved itself.

Topic		Replies	Views
Derivation from softmax regression to logistic regression Advanced Learning Algorithms week-module-2	6	684	June 19, 2025
Confusion about the mathematical formula of a1,a2,a3,a4 in softmax regression Advanced Learning Algorithms week-module-2	4	338	October 25, 2023
Why use Softmax instead of a linear transform that sums to 1? Neural Networks and Deep Learning coursera-platform	5	1015	May 28, 2021
Softmax Regression - Constructing Custom Function Supervised ML: Regression and Classification week-module-3	13	157	August 29, 2024
Where does this e^z come from while doing softmax? Advanced Learning Algorithms week-module-2	5	467	July 9, 2023

Derivation of softmax function from sigmoid function of logistic regression

Related topics