An advantage of squaring the error is to keep them all non-negative so that we can sum the errors up without them canceling each other. Assume we have the following list of errors [1, -1, 2, -2, 3, -3, 4, -4]
, if we don’t square and just add them up, the cost will become zero which is unwanted because we can’t use this unsquared cost to observe existing errors.
We use squared loss for linear regression, but it can also be used in other regression models such as a regression neural network or a regression xgboost trees.
Similarly, log loss is not only used by logistic regression, but can be used in other classification models such as a classification neural network or a classification xgboost trees.
Cheers,
Raymond