Linearity of Logistic Regression

Hi,

I have more of a conceptual question of logistic regression. Why is that logistic regression is known as a linear model? Since we are fitting the linear regression into the sigmoid, isn’t this a non-linear model now?

Thank you!

In logistic regression, the log-odds (also called the logit) of the probability of the binary outcome is modeled as a linear combination of the input features. This is the linearity aspect of logistic regression.

The output of logistic regression is not directly the probability itself; it’s the log-odds of the probability, which is then transformed using the sigmoid function to obtain the final probability.
The sigmoid function introduces non-linearity and allows logistic regression to capture non linear relationships.

In essence, logistic regression can be understood to have both a linear and a non linear aspect. Hope this helps.

Yes, as Lukman says, the overall function that maps the input to the prediction in Logistic Regression is a non-linear function, because of the presence of sigmoid in the composition of the function. But as Lukman also pointed out, the first step of applying the weight and bias values to the input is just a standard linear transformation. Well, some people would call it an “affine” transformation when you add the bias value. Now consider what the decision boundary looks like: we interpet the answer as “yes” or “true” if the output of sigmoid is \geq 0 and “no” or false if the output of sigmoid is < 0.5. So the decision boundary is:

\sigma\left (\displaystyle \sum_{i = 1}^n (w_i * x_i) + b\right) = 0.5

But we also know that \sigma(0) = 0.5, so the above equation is equivalent to:

\displaystyle \sum_{i = 1}^n (w_i * x_i) + b = 0

If you are familiar with Linear Algebra in more than 2 dimensions, you’ll recognize that as the equation of a hyperplane with normal vector (w_1, w_2, ...., w_n) and with distance to the origin of b. So what we have just figured out is that the decision boundaries that can be expressed by Logistic Regression are linear, meaning hyperplanes in \mathbb{R}^n. That may also be why someone would refer to LR as a “linear” algorithm, even though the full function is non-linear. Of course not all data is linearly separable, so that accounts for why LR is not as powerful as full Neural Networks, which can express non-linear decision boundaries. It may do quite a good job with some kinds of classification problems, but as we saw in DLS Course 1 it’s not as good at finding pictures of cats as a real neural network. That task requires a more complex decision boundary.

1 Like