Adam optimzation

Anbu · March 1, 2024, 8:54am

When creating a post, please add:

Week # must be added in the tags option of the post.
Link to the classroom item you are referring to:
Description (include relevant info but please do not post solution code or your entire notebook):

Compared to the original gradient descent algorithm that you had learned in the previous course though, the Adam algorithm, because it can adapt the learning rate a bit automatically, it is more robust to the exact choice of learning rate that you pick. Though there’s still way tuning this parameter little bit to see if you can get somewhat faster learning.

In the above statement from the lecture video, we are having couple of doubts. can you please help to clarify it please ?

Doubt 1: what does it meant by more robust to the exact choice of learning rate that you pick ?

Doubt 2: If the default learning rate is 0.001, adam will use the default only throught the all epochs or it will adapt the learning rate moving away from default

TMosh · March 4, 2024, 4:58am

Answer 1: The Adam method adapts the learning rate during training, so you don’t have to guess as accurately in advance what the best learning rate will be.

Answer 2: See Answer 1.

Topic		Replies	Views
Adam Optimization Advanced Learning Algorithms week-module-2	2	509	August 9, 2022
Do we need to use a learning rate scheduler for adaptive optimizers like Adam, AdaGrad? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	607	July 26, 2021
Adaptive Learning Rates AI Discussions	2	99	October 31, 2023
Why don't we put a vector of initial learning rate in adam optimization instead of a single one Advanced Learning Algorithms week-module-2	9	303	February 10, 2024
DLS Course 2 Week 2: adam is the worst algo Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	584	July 22, 2023

Adam optimzation

Related topics