Why MSE is not a good loss function for logistic regression?

I know that MSE is used for continuous target variable, but Isn’t the deep blue color global minima here?

From the notebook C1_W3_Lab04_LogisticLoss_Soln

Logistic Regression uses a loss function more suited to the task of categorization where the target is 0 or 1 rather than any number.

Yes but the output of the sigmoid function is not 0 or 1 it is continuous value of range [0, 1]

MSE isn’t designed for it. It’s good at giving a continuous value. That makes it difficult to know where to draw the line for classification. I mean what if it returned a value between -20 and +1000. Where is the line between True and False, or cat vs non-cat? Logistic Regression uses other loss functions better suited to returning a number between 0 and 1 making it easier to say “I want to be at least 70% certain” so you throw an if statement in there that says for any value greater than 0.7 return 1.

Another possible issue with MSE is occasionally you might get multiple local minima. Take a look at your graph. b side between -10 and -15 there is a small local minimum. What if you start around W of 8 and b of -5. You’ll find that local minimum between -10 and -15 before you find the true minimum between -15 and -20 and gradient descent will see b starting to increase in cost as you move more negative and stay in that local minimum.

1 Like

I see, also we we combine the two graphs of the loss function it gives similar shape as convex. So the more convexal loss function the less number of local minimas it have to hit and faster convergence.