I’m curious to know why the sigmoid function is used for logistic regression. If the goal is to output a value between 0 and 1, I believe there are other functions that can do the same. Why is the sigmoid function so special?
I believe we get little more: It helps us estimate the probability of positive or negative class enhancing the likelihood of a sentiment being positive or not.
By no means a subject matter expert; I’m also a student.
I think it’s probably because sigmoid has a couple of convenient characteristics. It returns a value between 0 and 1, which can be interpreted as a probability. When its inputs, z sub i, are negative, it’s output is <0.5, and when positive >0.5. It also introduces a non-linearity, so the more positive or negative the input, the confidence is far higher than a strictly linear model.
Actually, I would like to know the mathematical proof or justification that the sigmoid function is the best fit. I have found some answers here and one of them mentioned the maximum entropy theory. I’m not a math expert but very interested to make sense of all those concepts.
This link may prove useful
Why Sigmoid: A Probabilistic Perspective