why is the cost function in “Optimization of squared loss - The one powerline problem” video is the square of the distance and not just the distance?

Thanks

why is the cost function in “Optimization of squared loss - The one powerline problem” video is the square of the distance and not just the distance?

Thanks

Hello @ibrahim97 and welcome to the Deep Learning community.

I think one of the reasons for using the squared loss function is that it penalizes larger errors (gives more weight to larger deviations) more heavily than smaller errors. This is because the squared error grows more quickly than the absolute error as the distance between the predicted and actual values increases.