If I have two classes where the output value for linear regression is close to each other on the number line, how effective is logistic regression in classifying them?
For example, class 1 has an average y_hat of 5.5 and class 2 has an average y_hat of 3.3. Observations would both be placed in class 1 if we use 0.5 as a threshold.
Also, what if z (predicted value of y) is never negative? Then the probability of being in class 1 will always be more than 0.5.
I’m not sure if I’m missing something here, so any clarification would be appreciated.
sigmoid(3.3) = 0.964 and sigmoid(5.5) = 0.995 which are both extremely close to 1 and would be classified as being in the same group (1) given the threshold 0.5.
My question is, would I have to change the threshold value to distinguish between these two data points, if in the context of the dataset, they belong in two separate groups.