MSE isn’t designed for it. It’s good at giving a continuous value. That makes it difficult to know where to draw the line for classification. I mean what if it returned a value between -20 and +1000. Where is the line between True and False, or cat vs non-cat? Logistic Regression uses other loss functions better suited to returning a number between 0 and 1 making it easier to say “I want to be at least 70% certain” so you throw an if statement in there that says for any value greater than 0.7 return 1.

Another possible issue with MSE is occasionally you might get multiple local minima. Take a look at your graph. b side between -10 and -15 there is a small local minimum. What if you start around W of 8 and b of -5. You’ll find that local minimum between -10 and -15 before you find the true minimum between -15 and -20 and gradient descent will see b starting to increase in cost as you move more negative and stay in that local minimum.

I see, also we we combine the two graphs of the loss function it gives similar shape as convex. So the more convexal loss function the less number of local minimas it have to hit and faster convergence.