Exponential learning rate decay

psv · May 8, 2021, 5:59pm

One of the formulas suggested in the videos for the learning rate decay is
alpha_t = 0.95^t alpha_0 .
In this case the sum_{t=0}^infinity alpha_t is finite. Isn’t it a problem? Doesn’t it prevent the gradient flow from reaching its destination?

nramon · May 11, 2021, 1:39pm

Hi, @psv.

It could, depending on the decay rate and the number of epochs you train your model (which is obviously finite).

If it does, just tweak your hyperparameters

suki · May 11, 2021, 2:03pm

@psv
Ditto Ramon’s response.
In Adam optimization, the main hyperparameter you would be tuning is the alpha, and it changes from one set of data to the another. As far as I know, there is no one-fits-all type of solution and they can vary in order of magnitude!

nramon · May 11, 2021, 2:10pm

Couldn’t agree more. Thanks, @suki

Topic		Replies	Views
W2 Assignment - How learning rate decay does such a great job Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	522	September 10, 2022
Intuition behind the learning rate decay formula Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	517	April 24, 2022
Exercise 9- schedule_lr_decay Week 2 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	655	May 23, 2021
Learning rate decay in Quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	527	January 12, 2022
Bug in learning rate decay code Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	548	May 9, 2021

Exponential learning rate decay

Related topics