Hi, I have some questions in Vectorizing LR Gradient Computation it’s mentioned that in order to run Gradient Descent you would need to iterate explicitly defining a for loop say
for iter in range(1000):
perform gradient descent
- How do we choose how many iterations to go through?
- Is the reason that we assume convergence due to the fact that the LR cost function is convex?
There are two relevant hyperparameters that affect convergence: the “learning rate” and the number of iterations. The learning rate is the constant \alpha that multiplies the gradient values to determine how big a “step” is taken in each iteration. Even though the cost function for LR is convex, there is no guarantee that Gradient Descent will converge with an arbitrary value of the learning rate. With too large a value, you can get divergence or oscillation. You have to tune the learning rate together with the number of iterations in order to get good behavior. There is no single magic recipe that is guaranteed to work in all cases.
A couple other general points worth making here:
The convexity is specific to Logistic Regression and that will not be true once we graduate to real neural networks in Week 3.
Prof Ng is showing us how to build Gradient Descent from the ground up here, so that we have good intuitions about what is happening with iterative convergence and why training is expensive. But he is showing is the simplest and most straightforward version of Gradient Descent just to keep the programming tractable and not distract us with too many details. There are more sophisticated versions of conjugate gradient algorithms which manage the learning rate dynamically for you.
Thanks so much Paul awesome explanation!