Hi team,

Why don’t we get the convex function ,if we use MSE as cost function in Logistic Regression.

Thanks in Advance

Hi team,

Why don’t we get the convex function ,if we use MSE as cost function in Logistic Regression.

Thanks in Advance

It turns out that distance based cost functions do not perform well on classifications. To MSE, the difference between 0.49 and 0.51 is the same as the distance from 0.89 to 0.91, right? But in a classification problem where the label is True, those two changes do * not* have the same effect, right? So giving them both the same reward or punishment is not a good strategy.

My previous reply gives a way to think about why traditional distance metrics don’t work well for classification problems, but it may also help to see a visual demonstration. Here is a graph of what the loss surface looks like if you use the MSE cost function with the dataset from Prof Ng’s original Stanford Machine Learning course in the Logistic Regression assignment there:

For comparison, here is what you get with the standard cross entropy or “log loss” cost function that is used for Logistic Regression:

A classic example of the traditional maxim: “A picture is worth a thousand words.”

These plots are courtesy of Olivier Philip who was a mentor for Stanford Machine Learning a few years back when I took that course.

Yeah thanks for that clarification. But still the idea behind the distance between 0.91,0.89 and 0.51 and 0.49.the idea is like the distance is calculated between actual y and y hat but the the ys will be only 0 and 1 right?

Hey,

Can you please tell whether I made you clearly understand my query.

Yes, you’re right that the loss is calculated between the label (y) and the prediction (\hat{y}). But the same argument still applies: Euclidean distance is not a good metric for a classification problem using the intuition that I gave above.