My previous reply gives a way to think about why traditional distance metrics don’t work well for classification problems, but it may also help to see a visual demonstration. Here is a graph of what the loss surface looks like if you use the MSE cost function with the dataset from Prof Ng’s original Stanford Machine Learning course in the Logistic Regression assignment there:
For comparison, here is what you get with the standard cross entropy or “log loss” cost function that is used for Logistic Regression:
A classic example of the traditional maxim: “A picture is worth a thousand words.”
These plots are courtesy of Olivier Philip who was a mentor for Stanford Machine Learning a few years back when I took that course.