Relation between logistic and linear regression algorithms

Haider_Abbas4 · July 20, 2024, 10:23am

Within the logistic regression algorithm, why do we use the output of the linear regression model, i.e., f(x)=wx+b, as the input of the sigmoid function? Why don’t we just directly input the feature itself into the sigmoid function? Can anyone explain this with intuition?

Alireza_Saei · July 20, 2024, 11:15am

Hi @Haider_Abbas4

Because f(x) represents a linear combination of the input features (captures weighted sum of the features). This linear combination reflects how different features contribute to the prediction. Directly inputting the features into the sigmoid function doesn’t handle each individual contributions and interactions in a linear, separable way. The sigmoid function then makes this linear combination a probability value between 0 and 1, making the results ready for a binary classification.

Hope it helps! Feel free to ask if you need further assistance.

Haider_Abbas4 · July 20, 2024, 11:35am

thanks for your response, but I already got the exact response from ChatGPT. Still, It doesnt make sense to me. Its kind of confusing to mentally visualize, like what is meant by a weighted sum of features. I mean, its all fuzzy until I see it in a graph or visual.

Nick_B1 · July 20, 2024, 11:38am

As Alireza said, the function, f(x), tries to measure the effects of changes in x, on y. Just as we do in simple linear regression.

And since this is a linear combination of variables, with no constraints, the output will belong to ( - infinity, infinity). I’m side-stepping a few mathematical concepts here to keep it concise.
This is fine in regression setting when we’re trying to predict a numerical variable. But in logistic regression, we’re predicting the probability of a class, and therefore, we want our result to be within [0,1].

Therefore, the problem we are trying to solve for is to measure the effects of X on Y, while making sure at the same time that the result is between [0,1]. We do this in 2 steps to facilitate the computation.

So, in linear combination, we get :
f(x)= Y = wx+b

Then to get the probabilistic result, we use the sigmoid function, which maps a result in (-inf, inf) to (0,1).

sigmoid (Y) = 1 / ( 1 + exp(-Y) )

So, as you see, the feature is itself in sigmoid function, but used indirectly.

Alireza_Saei · July 20, 2024, 12:00pm

A very good explanation by Nick!

To add, features have different effects on the outcome, so we use a linear combination ( f(x) = wx + b ) to capture these different contributions (e.g., some words have a greater impact on spam email detection than others). Inputting this combined value into the sigmoid function gives us the probability needed for binary classification. Directly inputting the features into the sigmoid function can lead to poor results, as there is nothing to learn from the data (the sigmoid function has no learnable parameters).

Alireza_Saei · July 20, 2024, 12:02pm

I’ll try to find visual examples of logistic regression if you’re still confused!

paulinpaloalto · July 20, 2024, 8:52pm

Well, does linear regression make sense to you? There you take the linear combination of the weights and input features to compute a final output, which is just a real number between -\infty and \infty, e.g. the predicted price of a house or stock price or the temperature at noon tomorrow. The weights and bias value are learned by the model based on our training data and we hope that the training works and we get good predictions.

Well, in logistic regression what we are trying to produce is a “classification” instead of an output real number. For example, does a picture contain a cat or not? So what we do is take the same linear combination we used in linear regression and convert it into the probability of a “yes” answer by feeding it to the sigmoid function. We need a single “yes/no” answer, which is why we can’t apply sigmoid to the individual features. Once we have defined the model in this way, then we train it based on our training data (e.g. pictures with “cat” and “not a cat” labels).

Topic		Replies	Views
Regarding Logistic regression function's input Supervised ML: Regression and Classification week-3	6	1073	August 30, 2023
Logistic Regression fundamental question Supervised ML: Regression and Classification week-3	15	156	February 12, 2025
Why is the sigmoid function's z term equal to "w*x+b" in logistic regression? Supervised ML: Regression and Classification week-2	9	465	January 7, 2025
Logistic regression output function AI Discussions	3	58	April 19, 2023
Why do we use the sigmoid function at the end? Supervised ML: Regression and Classification week-3	2	281	December 18, 2023

Relation between logistic and linear regression algorithms

Related topics