Issues with Large Values in the Cost Function

daniel6 · October 7, 2022, 9:41am

Hi,

I just finished watching the series on the cost function. I understand the need to square the error to avoid negative errors. But is it a problem that by using the square error we will bias the fit for high values?

For example, say we have a data set x = [1,100] y = [2,104] according to squared error a fit of y_hat = [1, 104] is much better then a fit of y_hat = [2,100], even though that in the first case the y_hat value for when x = 1 has an error of 200%, while, when x = 100 and y_hat = 100, the error is only 4%.

Would it not make sense to use percentages or some other metric that avoids this issue, and is this an issue at all, or do we use this metric just because it is easy to minimisize? Thank you so much for any help!

Moaz_Elesawey · October 7, 2022, 11:12am

@daniel6

Hi Daniel

As you said the square error is one of the methods that is used to measure how our model is performing and for training the model as in the Gradient Descent also there is the absolute method which these two methods are the most common way to measure the performance of the model and also to train the model (which is done by minimizing that cost).

Now why do we use the squared error rather than the absolute error the reason comes from the algorithm of the Gradient Descent itself as we need to compute the derivative of the cost function in order to minimize the error as

x^{i+1} = x^{i} - \alpha \frac{\partial J}{\partial x}

finding the derivative is much easier to do for the squared error rather than for the absolute error which is that the derivative of the absolute error is not known at the zero point.

Now for using the squared values it does not matter as long as we are decreasing is the error and minimizing the function.

daniel6 · October 7, 2022, 11:16am

That makes sense! Thank you very much!

Moaz_Elesawey · October 7, 2022, 11:33am

@daniel6 I hope I had made it clear to you.

rmwkwok · October 7, 2022, 11:39am

Hello Daniel,

I just want to add that your concern is real, so if we want to stick with the squared loss, it’d be better for us to somehow screen out those “outliners”, or collect more data to balance out the outliners.

Raymond

Topic		Replies	Views
Cost function : Mean squared error Supervised ML: Regression and Classification week-module-1	4	937	December 12, 2022
Mathematical proof for the cost function Supervised ML: Regression and Classification week-module-1	3	697	June 21, 2022
Loss function to check error Calculus for Machine Learning and Data Science week-module-3	8	473	February 10, 2023
Cost Function Squared Error Machine Learning Specialization	1	96	July 19, 2022
Gradient Descent Negative Values Supervised ML: Regression and Classification week-module-1	3	54	July 25, 2024

Issues with Large Values in the Cost Function

Related topics