How could a large learning rate cause constantly increasing cost

My understanding was a large learning rate could cause the cost function to go up and down, as gradient descent would “jump over” the minimum value. However, I didn’t think it was possible to cause a constantly increasing cost function purely by choosing a large learning rate - I thought this would only happen due to issues with the code.

What’s an example where a large learning rate would cause the cost to constantly go up?

Let’s say you’re trying to minimize a convex quadratic cost function with a single parameter, like in linear regression. If you use a very large learning rate, the algorithm might take steps that are too big, jumping over the minimum point of the cost function. As a result, instead of converging to the minimum, it might oscillate back and forth or even diverge, causing the cost to increase with each iteration.

1 Like

There is just one more point that I want to add: in that convex cost space, we will have a larger gradient the farther we are away from the optimum. If the large learning rate is driving us away from the optimum, there is a chance that this is going to be worse and worse because we won’t be just having a constantly large learning rate, but also an increasingly large gradient, and they multiply with each other.

I think we can make some example with some of the labs in course 1 that implement gradient descent by trying to tune their learning rates.


Thanks @rmwkwok for the added intuition.

Thanks for the insight.

There’s a quiz in the course where a learning curve is shown with an exponentially increasing cost, and the reason for this given was that the learn rate was too large.

My understanding matched @lukmanaj in that I expected oscillating cost, not exponentially increasing cost due a learning rate that was too high.

It would be great to see an example where the cost exponentially increases, because I can’t imagine a situation where this would happen.

You didn’t say which quiz it was, but I guess it is a cost-vs-iteration curve. If you disagree, please share which course and week the quiz belongs to, and the question number.

I believe @lukmanaj was talking about a cost-vs-weight curve.

Oscillation in a cost-vs-weight curve can make an exponential cost-vs-iteration curve. They are not contradictory.

@lukmanaj also said “it might oscillate back and forth or even diverge, causing the cost to increase with each iteration.”

Go through the labs in course 1, look for one that implements gradient descent and has learning rate tunable, then tune it to some larger value to see that increasing trend. @dpschramm, this is one way for how you can try and see what you want to see by yourself.