Derivation from softmax regression to logistic regression

liyu · July 18, 2022, 8:10am

In video Professor Andrew Ng said, The softmax regression algorithm is a generalization of logistic regression.

How can the logistic regression model with sigmoid function be derived from Softmax regression model?

Thanks and best Regards
Liyu

rmwkwok · July 18, 2022, 8:39am

Consider we use softmax to predict a binary target, we need 2 neurons in the output layer representing the chance for target = 0 and target = 1. Let’s say their output values are z_1 and z_2 respectively,

softmax(z_1, z_2)_1 = \frac{\exp{(z_1)}}{\exp{(z_1)} + \exp{(z_2)}} = \frac{1}{1 + \exp{(z_2 - z_1)}} = sigmoid(z_1 - z_2)

Cheers,
Raymond

liyu · July 19, 2022, 7:03am

oh, yes! Thanks Raymond!
Liyu

tamalmallick · June 13, 2025, 3:45pm

Hi @rmwkwok,

Could you explain how sigmoid (z1-z2) is equivalent to sigmoid(z1)?

rmwkwok · June 14, 2025, 12:06am

Hello, @tamalmallick,

Let’s consider two neural networks sharing the same base network but different heads which are, respectively, softmax layer and sigmoid layer.

My arguments of sigmoid(z_1-z_2) and sigmoid(z) being equivalent are that -

z_1 is a linear combination of a_1, a_2, ..., a_n, and so is z_2
\implies z_1 - z_2 is a linear combination of a_1, a_2, ..., a_n

Because z is also a linear combination of a_1, a_2, ..., a_n, z and z_1 - z_2 are equivalent.

Cheers,
Raymond

tamalmallick · June 17, 2025, 3:48pm

hi @rmwkwok,

True, both of them are linear. Should the values of both linear expressions necessarily be the same?

rmwkwok · June 19, 2025, 9:38pm

Hello, @tamalmallick,

For the sake of discussing your last question, let me make two changes to my previous result/graph:

previously, I had softmax(z_1, z_2)_1 = sigmoid(z_1 - z_2), now I have softmax(z_1, z_2)_2 = sigmoid(z_2 - z_1) which predicts the probability of being class 1.
given the new equation above, I add a new branch to my previous graph for a new neural network 0:

image1200×548 72.6 KB

Now, if we build neural networks 0 and 1 and initialize them to the same set of weights, and we train them, then we will get very similar set of weights at the end for network 0 and 1. I say “very similar” instead of “the same” because any intermediate numerical round-offs may accumulate into some very small difference. You might really just build them in Tensorflow and let them go through the same training procedure, then look at the trained weights.

If you are interested to give the following question a try, what about networks 0 and 2, what conclusion can we make about them?

Cheers,
Raymond

Topic		Replies	Views
Derivation of softmax function from sigmoid function of logistic regression Advanced Learning Algorithms week-module-2	9	64	June 18, 2025
Confusion about the mathematical formula of a1,a2,a3,a4 in softmax regression Advanced Learning Algorithms week-module-2	4	323	October 25, 2023
Why in this lecture slide we are putting vector Z in to tf.nn.sigmoid when we used softmax? Advanced Learning Algorithms week-module-2	3	526	October 21, 2022
Where does this e^z come from while doing softmax? Advanced Learning Algorithms week-module-2	5	441	July 9, 2023
How to prove that softmax is a generalized linear model C2W3 Improving Deep Neural Networks: Hyperparameter tun week-module-3 , coursera-platform	6	42	August 25, 2024

Derivation from softmax regression to logistic regression

Related topics