Derivation from softmax regression to logistic regression

Hello, @tamalmallick,

Let’s consider two neural networks sharing the same base network but different heads which are, respectively, softmax layer and sigmoid layer.

My arguments of sigmoid(z_1-z_2) and sigmoid(z) being equivalent are that -

z_1 is a linear combination of a_1, a_2, ..., a_n, and so is z_2
\implies z_1 - z_2 is a linear combination of a_1, a_2, ..., a_n

Because z is also a linear combination of a_1, a_2, ..., a_n, z and z_1 - z_2 are equivalent.

Cheers,
Raymond