Hello, I was wondering why the example cost function squares the difference rather than using absolute value. It seems to me that this will overweight extreme differences, giving anomalies a disproportionate “pull” on the model, whereas using absolute value gives all the differences the same weight.

Hello Marley,

Welcome to our community!

I totally agree with you that, in a regression problem, the squared loss will favour “outliners” that deviate largely. Outliners are data points that do not follow our model assumption. So, we can’t assume the squared loss to be the single, best loss function to use in any data and in any situation, nor can we assume the absolute loss be the best. There is no best loss function for everything.

I think the most appropriate loss function is (1) chosen to adapt to our objective of building the model, BUT (2) verified by the cv data.

The squared loss has better mathematical properties than the abolute loss which I personally think is convenient for us to elaborate the idea of neural network training in most regression cases, so the squared loss is a very good choice for the courses.

In a business setting, however, if I have a business objective which is measurable in terms of the error of my model, then I would build one model with squared loss and another model with that business objective function, and see which one will do better on my cv set. Certainly I won’t always oppose to also try the absolute loss.

Cheers,

Raymond