How does MSE ensure that the cost of the error for prediction will be minimum?
Any simple to understand mathematical proof available?
What guarantees finding a minimum is that the MSE cost function is convex, and the gradients (the partial derivatives of the MSE) will lead to the minimum.