Mini batch learning rate decay

Omar_Aziz · July 17, 2024, 7:14am

In week2 minibatch learning decay was proposed with formula:
alpha= (const/sqrt(t))* initial_alpha where t is the mini batch number so this means that at the beginning of each epoch the learning rate alpha is set equal to (const * initial_alpha) I don’t fully get how can this help when we advance in training epochs

balaji.ambresh · July 17, 2024, 7:28am

Please provide reference to the lecture / notebook where you see the reference to mini batch number in the denominator. Here’s the formula in course 2 week 2 assignment 2

\alpha = \frac{1}{1 + decayRate \times epochNumber} \alpha_{0}

Omar_Aziz · July 17, 2024, 7:31am

minute “5:26” the lecture preceding the final one

balaji.ambresh · July 17, 2024, 9:31am

Have you heard of Cyclical Learning Rates for Training Neural Networks ?

Omar_Aziz · July 17, 2024, 10:17pm

So it’s basically meant to help the model escape plateau regions like saddle points, right?
Also it’s app dependent might work from some models and others not, right?

balaji.ambresh · July 18, 2024, 8:43am

You’re correct about escaping the plateau regions using cyclical learning rates.

Here’s the text from the conclusion section of the paper which states that the author plans to evaluate this approach on other architectures:

This work has not explored the full range of applications
for cyclic learning rate methods. We plan to determine if
equivalent policies work for training different architectures,
such as recurrent neural networks. Furthermore, we believe
that a theoretical analysis would provide an improved understanding of these methods, which might lead to improvements in the algorithms.

Where does the paper state the kind of architectures for this this approach is unsuitable?

As far as using this approach is concerned, here are the hyperparameters for a particular architecture:

Minimum and maximum learning rates for your optimizer.
Linear / exponential / any other approach for varying the learning rate.
Number of batches across which the cycle should occur (this is usually set to number of batches per epoch)

Omar_Aziz · July 19, 2024, 12:52am

Thanks a lot

Topic		Replies	Views
Course 2 Week 2: Decay rate Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	492	July 25, 2022
Exponential learning rate decay Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	586	May 11, 2021
Course2 Week2 Programming Assignment: Optimization Methods Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	599	June 7, 2021
W2 Assignment - How learning rate decay does such a great job Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	527	September 10, 2022
Bug in learning rate decay code Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	550	May 9, 2021

Mini batch learning rate decay

Related topics