Loss function to check error

Sanjaya_Nagabhushan · February 4, 2023, 12:17pm

For the loss function, can we not take the absolute value instead of taking the square ? Taking absolute value is simpler than taking square from a computation perspective

Isaak_Kamau · February 4, 2023, 1:13pm

Hello @Sanjaya_Nagabhushan , I think the reason Mean Squared Error is commonly used is because of its tendency to penalize large error more than small errors (It have so much impact on outliers) and also it eliminates possibilities of getting negative values.

Sanjaya_Nagabhushan · February 4, 2023, 6:00pm

Thanks @Isaak_Kamau. Could you please elaborate with an example what exactly do you mean when you say ‘penalize large error’ ?

Isaak_Kamau · February 10, 2023, 3:52pm

Hello, @Sanjaya_Nagabhushan I am sorry I lost this tread.
I meant “Squaring emphasizes larger differences” just imagine squaring a loss function of a number like 2 you will get 4(Less Impact) but when you square a loss function of a number like 10 you’d get 100 (huge impact) Remember when we are training a model the goal is to reduce the loss function (Predicted value - Real value) so the mean squared error I think it will much help us in spotting outliers and huge loss functions. Please refer to this thread for more link

Let me know if you have another question.

Happy Learning
Isaak Kamau

paulinpaloalto · February 10, 2023, 5:13pm

And squaring a number like 0.1 gives you 0.01. So it also deemphasizes small errors. This also helps to make the quadratic loss function perform better than using the absolute value of the difference.

Isaak_Kamau · February 10, 2023, 5:17pm

Exactly, This makes it even better @paulinpaloalto Do you think it has some cons

paulinpaloalto · February 10, 2023, 5:24pm

There are some cases in which people do use absolute value as the loss function, e.g. Lasso, but for things like linear regression quadratic loss is a clear win.

The other way to see the implications of the behavior is to consider the derivatives. If you use:

f(z) = \displaystyle \frac {1}{2} z^2

Then f'(z) = z, of course. Whereas for

g(z) = |z|

the derivative is -1 for z < 0, +1 for z > 0 and undefined at z = 0, although it turns out in practice that the non-differentiability at 0 is not a problem.

So think about the implications of those derivatives for how the gradients will work to push the parameters in the direction of a better solution:

In the quadratic case, the “force” of the correction supplied by the gradients is exactly proportional to the magnitude of the error.

In the absolute value case, the “force” of the correction is blind to the magnitude of the error.

Isaak_Kamau · February 10, 2023, 5:46pm

@paulinpaloalto This is great. Had never thought of it from this perspective, @Sanjaya_Nagabhushan Hope now you have a better understanding of your question?

Sanjaya_Nagabhushan · February 10, 2023, 6:43pm

Thanks much @paulinpaloalto and @Isaak_Kamau. Appreciate your guidance

Topic		Replies	Views
Cost function : Mean squared error Supervised ML: Regression and Classification week-1	4	917	December 12, 2022
Why does the sqr cost function not use absolute value Neural Networks and Deep Learning	1	495	December 9, 2022
Issues with Large Values in the Cost Function Supervised ML: Regression and Classification week-1	4	491	October 7, 2022
Cost Function Squared Error Machine Learning Specialization	1	95	July 19, 2022
What are alternatives to Squared Error for the cost function? Supervised ML: Regression and Classification week-1	3	554	July 2, 2022

Loss function to check error

Related topics