week-module-1
i am watching “Supervised Machine Learning: Regression and Classification > Module 1 > Cost function formula”
and have a question why does the cost function take squares of the error?
If the goal is to make the error always positive then why not just take the absolute value?
That is one method. Sometimes it is used.
Its drawback is that we also need to compute the partial derivative of the cost value (i.e. the gradients). The absolute value’s derivative at 0 is undefined.
thank you for your answer! so it can actually be used. i am not super familiar with derivatives so can not easily understand “The absolute value’s derivative at 0 is undefined”. but anyway thanks! i am now watching the next video “Cost function intuition”
If the error is small (like 0.1), squaring it makes it even smaller (0.01), but if it’s large (like 10), it becomes much larger (100). This helps the model focus more on fixing big errors.
It is important to focus on big errors because if your target value is 1 and your model predicts 1.98 then its almost perfect but if your target value is 10 and your model predicts 7 …
Then during gradient descent, when we take the derivative, large errors give larger gradients, so the model updates more for them. This way, larger errors are given more weight than smaller ones.