Hello everyone,
Thanks for the wonderful course.
I tried implementing logistic regression to some practical binary classification problems and observed that it works no better than simple linear regression where the desired output (Y) equals either to 1 or 0. After the regression coefficients are computed, I can calculate the probability by applying sigmoid function: P=sigmoid ((w,X)+b).
So, my question is: what is the benefit of following the logistic regression algorithm with its specific loss function over the âsimpleâ linear regression?

Will be grateful for the answer.
My best regards,
Vasyl,
Kyiv, Ukraine

Hey @vasyl.delta,
Welcome to the community. I suppose we are clear on the fact that Logistic Regression is nothing but Linear Regression + Sigmoid, at least as far as the the way of predicting the target values is concerned.

So, your query boils down to âWhatâs the advantage of using logistic regression with the logistic loss function instead of using it squared error cost function?â. In Week 3 of this course, you will find that Prof Andrew describes this advantage in the video entitled âCost Function for Logistic Regressionâ. When you use Squared Error Cost function with Logistic Regression, the cost function that you get maybe a non-convex function, which could have multiple local optima, but when we use the logistic loss function, the cost function that you will get is a convex function, i.e., a single optima.

Now, you definitely wonât want only a locally optimal solution, hence, we stick with the logistic cost function. Now, coming to your practical results, it might be possible that in your case the difference between a locally optimal solution and the global optimal solution is negligible, and hence, you donât see any difference between the 2.

I hope this makes sense. Rest, you can share your results here as well, so that it helps the other learners as well, who may stumble upon this query.

In general, logistic regression is preferred over ordinary linear regression for classification tasks where the dependent variable is binary (e.g., âyesâ or ânoâ). This is because the logistic function is better suited to modeling binary outcomes, as it maps any input value to a value between 0 and 1, which is suitable for modeling probabilities.

However, it is possible to use ordinary linear regression for classification tasks by transforming the dependent variable into a binary form. For example, you could use linear regression to predict a probability of a binary outcome (e.g., the probability that a customer will convert on a website), and then use a threshold value (e.g., 0.5) to classify the outcome as âyesâ or ânoâ.

While this approach may be sufficient for some classification tasks, it is generally not recommended as it does not take into account the specific characteristics of the binary outcome. In particular, the linear regression model may produce probabilities outside the range 0 to 1, which is not meaningful in the context of a binary outcome.

In summary, while it is possible to use ordinary linear regression for classification tasks, logistic regression is generally preferred because it is specifically designed for modeling binary outcomes and is more powerful and flexible in this context.

Thank you very much for the answer! So valuable to have feedback. Would be very interesting to fins the example where binary logistic regression is really more effective in comparison with âbinarizedâ linear regression.
Vasyl

Imagine that you are working for a company that sells products online and you want to build a model to predict whether a customer will make a purchase (i.e., a binary outcome). You have data on the following features for each customer:

Age

Income

Gender

Education level

You decide to use both linear regression and logistic regression to build models to predict the probability of a purchase based on these features.

For the linear regression model, you first train the model using the age, income, gender, and education level features. The output of the linear regression model is a continuous value, which could take on any value within a given range. To transform this value into a probability, you apply the sigmoid function to the output of the linear regression model. The resulting probability can then be used to make predictions about the likelihood of a purchase.

For the logistic regression model, you also train the model using the age, income, gender, and education level features. The output of the logistic regression model is a probability that can be used to make predictions about the likelihood of a purchase.

Now, letâs compare the outcomes of the two models:

For a given customer, the linear regression model might predict a continuous value of 0.6, which would be transformed into a probability of 0.65 after applying the sigmoid function. The logistic regression model might predict a probability of 0.70 for the same customer.

For another customer, the linear regression model might predict a continuous value of -0.3, which would be transformed into a probability of 0.42 after applying the sigmoid function. The logistic regression model might predict a probability of 0.45 for the same customer.

Overall, you can see that the outcomes of the two models are similar, but the logistic regression model tends to predict slightly higher probabilities compared to the linear regression model with the sigmoid function applied at the end. This is because logistic regression is specifically designed to predict binary outcomes and may be more effective at modeling this type of data compared to linear regression with the sigmoid function applied at the end.

Hey @pastorsoto,
Donât you think that the difference in the outputs of the 2 models will be due to the difference in the cost functions only, since I believe, that there is nothing âinherentâ as such apart from the cost function that makes logistic regression different from the linear regression, assuming that we use the sigmoid function on the output of linear regression, as stated in the query?

I believe, that if we use sigmoid on linear regressionâs output, and say the same cost function, either the squared error or logistic loss to train both the models, then there should be no difference in the outputs of the 2 models, at least from the mathematical point of view. The way the libraries implement these 2 models may cause certain differences, but if we keep all the factors same, shouldnât the 2 models produce the same outputs?

This is more complex and includes multiple non-linear transformations

The optimization algorithm use to minimize the loss function are also different. Linear regression typically uses gradient descent, while logistic regression use stochastic gradient descent.

I think all three factors contributes to the difference between both outputs (Architecture, Cost function, Optimization algorithm) perhaps some problem you will be able to replicate the results, but others might not.

Just to confirm, the only difference is in the use of Sigmoid function here, isnât it? And we are assuming that we are using Sigmoid function on the output of Linear Regression, so donât you think that there will be no difference in the model architecture any more?

Also, as far as the optimizing algorithm goes, they are inter-changeable arenât they? The choice of whether to use GD or SGD, depends on us, and not on the choice of model, doesnât it?

So, donât you think the remaining factor is the cost function only?

Yes. But if you change the cost function it wonât be enough to reproduce the results, you also need to change the architecture and the optimization. So changing just the cost function is not enough. If you change the first two yes, you have the cost function only, but if you change the cost function only you also need to apply the transformation and the optimization algorithm.

Thank you very much!
Just to catch the opportunity to ask: I do not think that we can apply directly sigmoid function to the result of prediction by linear regression.
I mean, if linear regression produced prediction y=-0.3, sigmoid(y)=0.42 is not truthful probability for the output, since y=-0.3 is quite a strong suggestion that the output is zero. Rather, we should transform y at least to SIGNED value, e.g. 2y-1=-1.7 and then get sigmoid(2y-1)=0.17.

You may take the output of your linear regression model as a feature, and fit that feature to a logistic regression model, then use the trained logistic regression model to do that transformation.

Here is my guess: it is not common to model a binary classification problem with a linear regression, but if you do, it is a solution to use logistic regression to model the outcome of the linear regression.

As I understand, the simple (non-logistic) linear regression will have advantage over logistic regression in the case of non-binary classification (for instance, when we have to classify onto three categories). (Non-logistic) linear regression easily encompasses any number of possible âyâ values, whereas for logistic regressionâŚ I do no know at the moment, but the binary logistic regression is not generalized onto the case of more than two decisions.

the simple (non-logistic) linear regression will have advantage over logistic regression in the case of non-binary classification (for instance, when we have to classify onto three categories)

I think in general this statement is not true. As a counter example, letâs assume you want to build an early warning system to determine whether an information is correct or not [you could also think of an anomaly detection].

You could go for logistic regression, use the probability p for a false information and defined thresholds to derive a multi classification (similarily what you would do with a linear model as far as I understood), e.g. with the following logic:

p < 0.2 â> OK: seems to be fine

0.2 \leq p < 0.7 â> suspection: some action is recommended like getting an expert review

p \geq 0.7 â> NOT OK: information seems to be wrong.

This approach might not necessarily perform worse than a linear regression model which you also would use for threshold based conclusions, right?

I would say the model suitability really depends on the business problem and how well a certain model fits a data set and can generalise on it.

Here some threads that you might find useful to check out:

Thank you so much for a quick answer, Christian! ĐŃĐşŃŃ
Will look through links kindly sent by you.
Could you please tell what is meant by: âprobability p for a false informationâ
From the course I understood that logistic regression model provides just the probalilities of âclass=0â and âclass=1â. In which way is mentioned probability p is connected to them?

sure: itâs just the definition.
What I meant was basically the definition by cases which was the example from the previous post that if p = 1 this would semantically correspond to that the model provides 100 % probability for a failure case as output, see also this visualization (right side of plot):

Note: if you have multi class labels, also a multi class logistic regression could be applied, e.g.

if you solve a binary problem by fitting the model for each label

or alternatively if the loss minimised is the multinomial loss fit across the entire probability distribution,

Dear Christian,
Regarding this point:
âif you solve a binary problem by fitting the model for each labelâ
Could you please make it more clear.
Suppose, I want to classify variable X into three classes: A, B or C.
Does this mean that I make three binary classification tasks:
X belongs to A or X belongs to (B+C)
X belongs to B or X belongs to (A+C)
X belongs to C or X belongs to (A+B)
and then take a maximum among 3 obtained probabilities to choose a specific class?
Will be grateful to you for the answer.
Vasyl.