Feasibility of Log Reg When Classes Have Similar Lin Reg Outputs

Xavi422 · August 29, 2022, 12:26am

If I have two classes where the output value for linear regression is close to each other on the number line, how effective is logistic regression in classifying them?

For example, class 1 has an average y_hat of 5.5 and class 2 has an average y_hat of 3.3. Observations would both be placed in class 1 if we use 0.5 as a threshold.

Also, what if z (predicted value of y) is never negative? Then the probability of being in class 1 will always be more than 0.5.

I’m not sure if I’m missing something here, so any clarification would be appreciated.

TMosh · August 29, 2022, 12:42am

Those ‘y’ values are not classifications. They’re real numbers. That calls for linear regression - not classification.

Logistic regression is the basic method for classification.

Xavi422 · August 29, 2022, 1:00am

Isn’t z a real number which is the output of a linear regression model? w \dot x+b

TMosh · August 29, 2022, 1:18am

Yes. Why does that matter?

The key point is if your ‘y’ labels are real values, you’re doing linear regression. Not classification.

If your ‘y’ labels are “true/false” or a category, then you’re doing logistic regression and classification.

TMosh · August 29, 2022, 1:20am

If you are doing classification, then the f_wb is the sigmoid of (w*x + b), and the classes are split by >= 0.5.

The sigmoid function is critical to classification.

Xavi422 · August 29, 2022, 1:26am

In this line, I’m referring to z as y_hat. Sorry for the confusion.

TMosh · August 29, 2022, 2:00am

If you include the sigmoid(), all the values will be between 0 and 1, and the true/false threshold is 0.5.

Xavi422 · August 29, 2022, 2:10am

Yes, I agree.

sigmoid(3.3) = 0.964 and sigmoid(5.5) = 0.995 which are both extremely close to 1 and would be classified as being in the same group (1) given the threshold 0.5.

My question is, would I have to change the threshold value to distinguish between these two data points, if in the context of the dataset, they belong in two separate groups.

For example, change the threshold to 0.975.

TMosh · August 29, 2022, 2:24am

Classification isn’t used for clustering.
That’s a different learning method.

TMosh · August 29, 2022, 2:25am

You’re trying to apply a specific learning method to an entirely different task.

Topic		Replies	Views
W2 \| Logistic Vs Linear Regression \| Would you expect a bad model and why? Neural Networks and Deep Learning	26	795	August 8, 2022
Explain threshold in logistic regression Supervised ML: Regression and Classification week-3	10	528	November 10, 2022
C2W1 Neural Network layer Advanced Learning Algorithms week-1	2	544	June 13, 2023
C1_W2_Logistic Regression Video_Doubt about Linear Regression Neural Networks and Deep Learning	6	709	July 22, 2022
Z value equals to zero Supervised ML: Regression and Classification week-2	3	486	August 4, 2022

Feasibility of Log Reg When Classes Have Similar Lin Reg Outputs

Related topics