Do we need to use a learning rate scheduler for adaptive optimizers like Adam, AdaGrad?

I searched for this question online and came across this blog: (A brief history of learning rate schedulers and adaptive optimizers) which says that we do not need to use a learning rate scheduler with optimizers like Adam while Prof. Ng said in this video (
if we reduce learning rate over time then it may help speed up learning.

I’d like to request the people in community to share some thoughts on the topic.


Hi, @anujonthemove.

You will explore this question in exercise 7 of this week’s assignment.

In particular, you’ll see how learning rate decay scheduling allows Adam to achieve a similar accuracy faster.

As always, remember that what works best may be problem specific.

Enjoy the lab :slight_smile: