Learning Rate Decay

PrasadChaskar · August 10, 2021, 2:04am

After using adam optimizer we prevent noise steps during convergence. So why we perform learning rate decay??

jonaslalin · August 10, 2021, 7:49am

Common belief is that

decaying the learning rate helps the network converge to a local minimum and avoid oscillations.

However,

experiments suggest that they are insufficient in explaining the general effectiveness of lrDecay in training modern neural networks that are deep, wide, and nonconvex. We provide another novel explanation: an initially large learning rate suppresses the network from memorizing noisy data while decaying the learning rate improves the learning of complex patterns.

Source:

Topic		Replies	Views
Does learning rate decay offer much improvement in GD or Adam optimization? Improving Deep Neural Networks: Hyperparameter tun week-2	5	33	November 30, 2024
Adaptive Learning Rate Decay Improving Deep Neural Networks: Hyperparameter tun	1	513	May 3, 2022
W2 Assignment - How learning rate decay does such a great job Improving Deep Neural Networks: Hyperparameter tun	2	522	September 10, 2022
Intuition behind the learning rate decay formula Improving Deep Neural Networks: Hyperparameter tun	1	517	April 24, 2022
Learning rate decay in Quiz Improving Deep Neural Networks: Hyperparameter tun	2	527	January 12, 2022

Learning Rate Decay

Related topics