Adam optimzation

When creating a post, please add:

  • Week # must be added in the tags option of the post.
  • Link to the classroom item you are referring to:
  • Description (include relevant info but please do not post solution code or your entire notebook):

Compared to the original gradient descent algorithm that you had learned in the previous course though, the Adam algorithm, because it can adapt the learning rate a bit automatically, it is more robust to the exact choice of learning rate that you pick. Though there’s still way tuning this parameter little bit to see if you can get somewhat faster learning.

In the above statement from the lecture video, we are having couple of doubts. can you please help to clarify it please ?

Doubt 1: what does it meant by more robust to the exact choice of learning rate that you pick ?

Doubt 2: If the default learning rate is 0.001, adam will use the default only throught the all epochs or it will adapt the learning rate moving away from default

1 Like

Answer 1: The Adam method adapts the learning rate during training, so you don’t have to guess as accurately in advance what the best learning rate will be.

Answer 2: See Answer 1.