W2 Assignment - How learning rate decay does such a great job

Tal_Alon · September 9, 2022, 6:20pm

Hi all,

Just making a note here, that as far as I can see was missing from the otherwise amazing explanations in the assignment’s notebook:

If anyone was wondering, how all three algorithms work so much better when we introduce learning rate decay, including the one with no optimization and the one with momentum (both doing not so well before):

It is because of the higher learning rate they started with (0.1 instead of the previous 0.0007).

It is possible to work with such a large initial value, because it is decaying so that the process can finally converge.

Tal_Alon · September 9, 2022, 6:39pm

by the way, we can see at the graph that the final learning rate at the end of the process is still way larger than the initial one used without optimization (0.02 instead of 0.0007).

Running the original, no decaying and no optimization experiment with a 0.02 constant learning rate produces the following result:

No so bad either (accuracy actually a bit higher than with a high initial rate that decays over time)

Elemento · September 10, 2022, 4:31am

Hey @Tal_Alon,
Thanks a lot for sharing your insights with the community.

Cheers,
Elemento

Topic		Replies	Views
Bug in learning rate decay code Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	548	May 9, 2021
Learning Rate Decay vs Weight Decay Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	433	August 4, 2023
Learning rate decay in Quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	527	January 12, 2022
Intuition behind the learning rate decay formula Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	517	April 24, 2022
Learning rate decay lecture slide Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	545	April 28, 2022

W2 Assignment - How learning rate decay does such a great job

Related topics