Mathematical Inquiry

Hi folks!

  • Week 3 Lab.

  • I have a basic question I can’t quite understand. Why do we use the logarithm when calculating the loss in logistic regression? Wouldn’t it be simpler to just calculate the difference between the predicted and actual values to get the same result? It seems like the log is doing something similar, but in a different form.

    Also, why do we use e (the natural exponential) in the sigmoid function? Why not just apply a simple threshold like in linear regression, where values above a certain threshold return 1 and values below return 0?

  • Could you recommend some math books to help build a strong grasp of these fundamentals?

Hey Abdullah,

In a linear regression model you assume there’s a linear relationship between the predictor(s) x(s) and some numeric / continuous outcome, y. Your model says: for each one-unit increase in x, y increases by w on average. That’s why the loss is mean squared error; you’re literally trying to minimize the squared distance between the predicted and actual numeric values.

In logistic regression, the y you are trying to predict isn’t continuous, it is binary (0 or 1). Because of that, you can’t model y linearly - probabilities have to stay between 0 and 1. So instead you have to model the log odds of y=1 (something happening) as a linear function of x. After modeling the log odds of y=1, you can convert the log odds back into a probability.

I hope that helps!

A great way to learn stats fast? Check out Josh Starmer’s YouTube channel, StatQuest.

3 Likes

The two types of regression have different goals.

  • Linear regression attempts to create a model that mimics the data, and the output range is all real numbers.

  • Logistic Regression attempts to create a boundary that splits the data into two regions, and its output range is limited to 0.0 (False) to 1.0 (True).

1 Like

Thanks for your response! Here’s what I noticed: I initially wondered why we can’t just calculate the difference between the predicted and actual values, but then I considered the case where the actual value is 1. If we use an error function like (1−x) * constant to scale the error, I realized this approach wouldn’t lead to an infinite value and would always intersect the y-axis. Specifically, at a predicted value of 0, we’d get a large constant value instead of infinity.

On the other hand, the gradient of this linear error function stays constant, whereas in logistic regression, the logarithmic error function behaves differently. The gradient of the log function starts out steep and gradually becomes smaller as we approach the optimal solution. This characteristic makes the log function more suitable because it allows for larger updates initially (when the model is far from the optimal) and smaller, more refined updates as the model converges, which helps the model approach the minimum error more effectively.
Also, thanks for recommending YouTube channel, I really appreciate it. I’m excited to learn from it.

2 Likes