I have a query in ML week 1 Andrew talked about optimization. if the cost function is already at the local minimal and there exists another optimized function at the global minimal how can GD optimize itself to reach the global minimal.?
If its already at a local minima and the performance is not satisfactory the a change of hyperparemeters, data, model… and others are needed to make the optimization function move to another optima, hopefully the global.
It is highly unlikely to encounter local minima as in high dimensional space say we have 50 inputs, so we have 50 weights so for the point to be a local minima it has to be a local minima in all the 50 dimensions which is highly unlikely.
And mostly we choose loss functions (convex) such that there are no local minima and there is only one global minima.
Even if we get stuck in the local minima we may re run gradient descent from another starting point and hopefully it will converge to global minima.