Confusing lost function for logistics regression

Can anyone explain me why the loss function of logistics regression is different from that of linear regression but when impelemented in code (specifically in Optional lab: Gradient descent for logistic regression) the gradient computing formalur appear to be the same

Hello @thaison23092003 ,
Welcome to the discourse community. Thanks a lot for bringing this question up. I am a mentor, I will try to do my best to answer your question.

The loss function of logistic regression is different from that of linear regression because logistic regression is a classification algorithm, while linear regression is a regression algorithm. Logistic regression predicts the probability of a data point belonging to a particular class, while linear regression predicts a continuous value.

The gradient computing formula for both logistic regression and linear regression is the same because they both use gradient descent to optimize their parameters. Gradient descent is an iterative algorithm that updates the parameters of a model in the direction of the steepest descent of the loss function.

Here is a more detailed explanation of the two loss functions:

  • Logistic regression loss function: The logistic regression loss function is called the binary cross-entropy loss function. It is defined as follows:

where yi is the ground truth label for data point i, h(xi) is the predicted probability of data point i belonging to the positive class, and n is the number of data points.

  • Linear regression loss function: The linear regression loss function is called the mean squared error (MSE) loss function. It is defined as follows:

where yi is the ground truth label for data point i and h(xi) is the predicted value for data point i.

As you can see, the only difference between the two loss functions is the term inside the logarithm. The binary cross-entropy loss function has a logarithm of the predicted probability, while the MSE loss function has a squared difference between the predicted value and the ground truth label.

The reason for using the binary cross-entropy loss function for logistic regression is that it is more appropriate for a classification problem. The logarithm of the predicted probability gives a higher penalty for predictions that are far from the ground truth label. This is because we want to minimize the number of false positives and false negatives.

The reason for using the MSE loss function for linear regression is that it is more appropriate for a regression problem. The squared difference between the predicted value and the ground truth label gives a higher penalty for predictions that are far from the ground truth label. This is because we want to minimize the error in our predictions.

Despite the differences in the loss functions, the gradient computing formula for both logistic regression and linear regression is the same. This is because both algorithms use gradient descent to optimize their parameters.

I hope my answer was helpful to you. Please feel free to create a follow up question if my answer was not clear enough.
Best,
Can Koz

1 Like

Thank you very much for such a dedicated answer. :heart_eyes: