What are alternatives to Squared Error for the cost function?

rohitjv · July 2, 2022, 1:49am

In the cost function formula video, Prof. Ng says:

In machine learning different people will use different cost functions for different applications, but the squared error cost function is by far the most commonly used one for linear regression and for that matter, for all regression problems where it seems to give good results for many applications.

Why is the squared error the most popular cost function for regression? What are the other alternatives, and why are they not as popular?

Thanks in advance!

TMosh · July 2, 2022, 3:30am

The squared-error cost function has several handy characteristics.

Both positive and negative errors are handled equally.
Large errors are more significant than small errors.
When you take the partial derivative (to get the gradients), it’s extremely easy to compute (the product of the example ‘x’ and the errors).
The gradients are defined for all possible values. There are no discontinuities.

tharunnayak14 · July 2, 2022, 4:12am

There are other cost functions like Root Mean Squared Error (just the square root of MSE) and Mean Absolute Error, MAE ( sum(abs(y_true, y_predicted)) ), but MAE is not sensitive to large errors, whereas MSE squares the error term so the large the error, the large the MSE.

rmwkwok · July 2, 2022, 4:54am

I am absolutely not the right person to talk about history… But I do think there is a historical reason for why squared error is so popular. A method called “least squares” was invented as early as 200 years ago (wikipedia, a survey paper in 1887 ), and the idea behind is basically the combination of linear regression and squared loss that we are learning today.

A reason why neural network is popular today is because we have the computational power to process big data, and the same rationale could also be considered for the case of least squares, although in a completely opposite direction. Least squares’ nice mathematical properties allow people to calculate the weights even without computers - only basic arithmetic operations needed. Remember we are talking about 100 years ago.

Hope that someone who knows the history better can talk a better story… All in all, I think it is popular today because of its wide range of applications through the years during the time when computers are not powerful or popular.

Just my view.

Topic		Replies	Views
Cost function : Mean squared error Supervised ML: Regression and Classification week-1	4	933	December 12, 2022
Issues with Large Values in the Cost Function Supervised ML: Regression and Classification week-1	4	491	October 7, 2022
Cost function: why mean squared error instead of least squares? Supervised ML: Regression and Classification week-1	1	635	October 31, 2022
Cost Function Squared Error Machine Learning Specialization	1	96	July 19, 2022
Help on course 1 for the machine learning specialization Supervised ML: Regression and Classification week-1	7	578	December 6, 2022

What are alternatives to Squared Error for the cost function?

Related topics