Hello, @tamalmallick,
Let’s consider two neural networks sharing the same base network but different heads which are, respectively, softmax layer and sigmoid layer.
My arguments of sigmoid(z_1-z_2) and sigmoid(z) being equivalent are that -
z_1 is a linear combination of a_1, a_2, ..., a_n, and so is z_2
\implies z_1 - z_2 is a linear combination of a_1, a_2, ..., a_n
Because z is also a linear combination of a_1, a_2, ..., a_n, z and z_1 - z_2 are equivalent.
Cheers,
Raymond
