Doubt regarding learning rate decay mechanism

Aryan06 · January 15, 2023, 1:00pm

Why there is a need to decrease the learning rate over time? Won’t the gradient descent automatically take smaller steps because slope value will decrease as we go down the cost function curve i.e. going towards the minima?

rmwkwok · January 15, 2023, 1:41pm

Hello @Aryan06,

Let’s look at this slide coming from one of the W2’s videos:

It is a very common drawback for switching from Batch GD to Mini-batch GD that the cost will oscillate. Overall speaking, the cost will be decreasing but that oscillation won’t disappear just because we are approaching to the minimum, because the size of the oscillation has to do with the size of the mini-batch. The smaller the mini-batch size is, the more likely we can run into a larger oscillation. Such oscillation is bad for us because it keeps the model from really converging, instead it makes the model to wander around the minimum.

To overcome this, we want to decrease the learning rate over time in such a way that hopefully when the model is close to the minimum, the learning rate will be significantly reduced enough to effectively kill off the oscillation, because if the learning rate is small, the weight’s updating step size will be small, and therefore the change of the cost function should also be small.

Cheers,
Raymond

Aryan06 · January 15, 2023, 3:37pm

Thank you @rmwkwok for the explanation

rmwkwok · January 17, 2023, 1:19am

You are welcome @Aryan06!

Topic		Replies	Views
Mini-batch gradient descent decreasing Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	697	September 3, 2021
Does learning rate decay offer much improvement in GD or Adam optimization? Improving Deep Neural Networks: Hyperparameter tun week-2 , coursera-platform	5	33	November 30, 2024
Understanding Mini batch size Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	571	July 10, 2021
Learning Rate - C1_W2_Lab03 Supervised ML: Regression and Classification week-2	6	536	April 26, 2023
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning coursera-platform	3	653	July 28, 2023

Doubt regarding learning rate decay mechanism

Related topics