Why is the number of iterations in gradient descent specified?

In the optional labs, gradient descent is implemented with a specific number of iterations and a for loop.

Why can’t we use a while loop and just stop when the optimum/minimal cost is not getting better anymore? Wouldn’t that lead to a better result?


Yes, it is a good observation there are more sophisticated ways to manage gradient descent, but note that it’s not quite so obvious as just quitting the first time the cost “ticks up”. It turns out that the convergence is not always monotonic. But this is just the very intro to all this and eventually we turn it over to “canned” packages like TensorFlow that have sophisticated internal implementations of all this.

Thank you for the quick feedback. Looking forward to learning more!

When you say that convergence is not always monotonic, is that in relation models …
a) that are not just simple linear regression with one feature/variable, and
b) that do not have a squared error cost function?

I can’t imagine how the cost could “tick up” here unless we overshoot the optimum.

Thanks for your support! Much appreciated.

Sorry, I should make the disclaimer that I have not taken MLS and don’t know what is covered in MLS C1 Week 1. If it is linear regression with the squared error cost function or logistic regression with the cross entropy loss function, then in those cases the solution surfaces are convex, but it is still possible (as you say) to overshoot the minimum if you are using a fixed learning rate. I was speaking of the general case of Neural Networks in which the solution surfaces are no longer convex and the paths you can take are much more complex and can exhibit a lot more varied behavior.

Ah, thank you for the clarification. Now I understand.