Do we need to use a learning rate scheduler for adaptive optimizers like Adam, AdaGrad?

anujonthemove · July 25, 2021, 7:12am

I searched for this question online and came across this blog: (A brief history of learning rate schedulers and adaptive optimizers) which says that we do not need to use a learning rate scheduler with optimizers like Adam while Prof. Ng said in this video (https://www.coursera.org/learn/deep-neural-network/lecture/hjgIA/learning-rate-decay)
if we reduce learning rate over time then it may help speed up learning.

I’d like to request the people in community to share some thoughts on the topic.

Thanks!

nramon · July 26, 2021, 8:44am

You will explore this question in exercise 7 of this week’s assignment.

In particular, you’ll see how learning rate decay scheduling allows Adam to achieve a similar accuracy faster.

As always, remember that what works best may be problem specific.

Enjoy the lab

Topic		Replies	Views
Adaptive Learning Rates AI Discussions	2	91	October 31, 2023
Adam optimzation Advanced Learning Algorithms week-2	1	219	March 4, 2024
Does learning rate decay offer much improvement in GD or Adam optimization? Improving Deep Neural Networks: Hyperparameter tun week-2	5	30	November 30, 2024
Learning Rate Decay Improving Deep Neural Networks: Hyperparameter tun	1	550	August 10, 2021
Adaptive Learning Rate Decay Improving Deep Neural Networks: Hyperparameter tun	1	513	May 3, 2022