I’d be thankful if anyone could help me with the question below please.
As we know the goal of the linear regression is to minimise the cost function J . What happens if our J function has multiple local minima points and only one of them is the global minimum? How can we make sure that we end up in the global minimum and not one of the local minima?
Regarding your question, I’d like to share a couple of links that discuss this very same topic:
and
I encourage you to read carefully these links as they will develop this topic clearly. In particular, the 2nd link will also address a paper by an important group of researchers.
The other answer by @Juan_Olano contains some great information on general cost functions and cost functions for neural networks.
However, since you are asking particularly about linear regression, there is a much easier answer. The cost function for linear regression is convex and therefore a local minimum is guaranteed to be a global minimum. Furthermore you can prove that with a small enough step size, for linear regression, gradient descent converges to a global minimum (this follows from convexity and from the fact that the gradient of the linear regression cost function is Lipschitz continuous.
Note: these are quite advanced mathematical concepts, but it might be good enough to just know that for linear regression (and for that matter, logistic regression), gradient descent is guaranteed to converge to a global minimum.