I had a question about a local max on a graph. Imagine you choose a large learning rate such that it keeps overshooting and moving away from the local minimum. Assume you somehow reach a local maximum such that the slope of the graph is 0. Would your gradient descent equation not give w = w and tell you that you found your local minimum?
Remember when you’re using a square error cost function with linear regression, the cost function doesn’t and will never have multiple local mĂnimum (it’s a convex function)