Feasibility of Log Reg When Classes Have Similar Lin Reg Outputs

If I have two classes where the output value for linear regression is close to each other on the number line, how effective is logistic regression in classifying them?

For example, class 1 has an average y_hat of 5.5 and class 2 has an average y_hat of 3.3. Observations would both be placed in class 1 if we use 0.5 as a threshold.

Also, what if z (predicted value of y) is never negative? Then the probability of being in class 1 will always be more than 0.5.

I’m not sure if I’m missing something here, so any clarification would be appreciated.

Those ‘y’ values are not classifications. They’re real numbers. That calls for linear regression - not classification.

Logistic regression is the basic method for classification.

Isn’t z a real number which is the output of a linear regression model? w \dot x+b

Yes. Why does that matter?

The key point is if your ‘y’ labels are real values, you’re doing linear regression. Not classification.

If your ‘y’ labels are “true/false” or a category, then you’re doing logistic regression and classification.

If you are doing classification, then the f_wb is the sigmoid of (w*x + b), and the classes are split by >= 0.5.

The sigmoid function is critical to classification.

In this line, I’m referring to z as y_hat. Sorry for the confusion.

If you include the sigmoid(), all the values will be between 0 and 1, and the true/false threshold is 0.5.

Yes, I agree.

sigmoid(3.3) = 0.964 and sigmoid(5.5) = 0.995 which are both extremely close to 1 and would be classified as being in the same group (1) given the threshold 0.5.

My question is, would I have to change the threshold value to distinguish between these two data points, if in the context of the dataset, they belong in two separate groups.

For example, change the threshold to 0.975.

Classification isn’t used for clustering.
That’s a different learning method.

You’re trying to apply a specific learning method to an entirely different task.