Does learning rate decay offer much improvement in GD or Adam optimization?

Brad_Banko · November 16, 2024, 10:46am

It seems that the normal process of approaching a minimum would reduce dw and db to make learning rate decay unnecessary?

nadtriana · November 16, 2024, 2:06pm

You’re right, as we approach a minimum during training, decreasing the gradient will lead to smaller parameter updates, even if the learning rate remains constant. However, relying on decreasing gradients alone isn’t always sufficient for efficient convergence, and this is where learning rate decay can still be beneficial. Learning rate decay provides an additional mechanism to control update step sizes, improving convergence stability and performance. It’s widely used because it helps models reach better minima more efficiently.

SNaveenMathew · November 16, 2024, 3:19pm

Agree with @nadtriana.

In addition there is more research about ‘learning rate schedule’ where the learning rate may be increased at later epochs - this may have the effect of escaping a local minima and converging to another local minima. Since we usually maintain the model weights after each epoch we can obtain the weights for the model with the optimal loss. This reddit article has some discussion.

gent.spah · November 17, 2024, 6:48am

Imagine there is blind man who can only feel by his hands and he is walking with them to find the right place, given the nature of all different geographical landscapes out there in real life its better to use caution and change the size of the steps based on what you feel, just so you don’t fall in the abyss.

Nevermnd · November 17, 2024, 7:43am

I just feel it is important to add, using a little @gent.spah’s metaphor, at first you want to take ‘big steps’ to get at the solution faster, but as you get closer you want to be more ‘gentle’ and tone it down and take tinier and tiner steps-- Otherwise you become like a bee buzzing around the hive but never quite arriving there (unless by luck).

Brad_Banko · November 30, 2024, 12:42pm

The course programming example helped to demonstrate the answer. Yes. Yes learning rate decay can help.

Topic		Replies	Views
Learning Rate Decay Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	552	August 10, 2021
Doubt regarding learning rate decay mechanism Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	511	January 17, 2023
W2 Assignment - How learning rate decay does such a great job Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	522	September 10, 2022
Adaptive Learning Rate Decay Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	513	May 3, 2022
Intuition behind the learning rate decay formula Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	517	April 24, 2022

Does learning rate decay offer much improvement in GD or Adam optimization?

Related topics