In the Learning Rate section of Gradient Descent chapter. First we note that if we set alpha the learning rate large then the algorithm may not converge. I agree with that.

By the end of the lecture we also note that if the learning rate is fixed than the gradient descent will also converge. So, we may not need to decrease the learning rate.

However in case of large alpha which may cause overshoot of minimum and due to that will diverge, we don’t have any life saver than decreasing the learning rate.

The last mention about fixed learning rate sounds confusing and I wanted to note that.

Setting a large \alpha can make w bounce around and can even cause it to diverge instead of converging. However, if we can find a high enough value of \alpha that will not make \frac {\partial J} {\partial w} flip from +ve to -ve for each iteration, then it should be okay.

Coming to your 2nd point about keeping the learning rate fixed: Once we identify a high enough value of \alpha that can steadily bring w to convergence, we don’t need to worry about reducing \alpha. The value of \frac {\partial J} {\partial w} as it approaches the minima will drop down to very very low values, and hence the \alpha * \frac {\partial J} {\partial w} value will continuously reduce as it approaches the vicinity of the minima.