what happens if we want to use linear regression and for the starting point, we accidentally choose local maxima. in this point the slope of the function is also zero like local minima and our algorithm doesn’t work for the same reason that it eventually ends.

There is no local maximum for the combination of linear regression model and squared loss, because this combination gives you a convex parameter space.

For other combination, it is possible, and in that case, given that the update formula being w := w - \alpha\frac{\partial{J}}{\partial{w}}, then yes, there will be no update to occur. However, such chance is very rare for randomly initialized parameters to give you exactly the local maximum.

As @rmwkwok mentioned this would be a very rare occurence. However, if by any unbelievable stroke of luck this were to happen right at the starting point, there is nothing stopping us from going back and re-initializing the weights with another set of random numbers and then get going with the Learning algorithm.

Right! MSE (mean squared error) loss is just a variation of Euclidean Distance. There is no such thing as the *maximum* distance between two points in \mathbb{R}^n: you have (literally) infinitely much room, so you can always move further away from the correct answer. As Raymond says, the loss function is convex in that case, meaning it looks like the multidimensional analog of an upward opening parabola.

But we will soon switch to using Neural Networks and there the cost functions are no longer convex, so you can encounter local maxima and “saddle points” where the gradients are 0. So Shanup’s answer will be the saving grace once we get to that more complex situation. We can always try again with different random initializations. The probability of exactly hitting a gradient of zero is also extremely small. As long as the gradient is not exactly zero, you’ll be able to move in some better direction, even if it takes a while to escape from the relatively flat area around the local optimum or saddle point.