Adam Optimiztion

rajsura82 · April 24, 2021, 4:14am

Hi!

Adam optimization if we set beta1 and beta2 to zero then it works out to:

=> W = W - learning rate * dW / sqrt (square(dW))
=> W = W - learning rate

This doesn’t make sense as with for beta set to zero, I thought formula should work out to:

W = W - learning rate * dW

Am I missing something?

nramon · April 24, 2021, 12:32pm

Hi, @rajsura82.

I think what you’re really left with is W = W - learning_rate x sign(dW), which intuitively makes sense:

Adam combines the ideas behind Momentum and RMSprop. You’re removing the effect of Momentum by setting beta1 to 0, so you’re left with the RMSprop part.
RMSprop combines the ideas of using only the sign of the gradient and adapting the step size separately for each weight. By setting beta2 to 0 you end up taking steps of size learning_rate in the direction opposite to the gradient (this is not equivalent to vanilla gradient descent).

Very interesting topic, @rajsura82. I hope my intuition is right. It would be great to hear more opinions

nramon · April 24, 2021, 10:39pm

By the way, you do get standard gradient descent if you’re just applying Momentum and you set beta to 0. Maybe that’s why you were expecting to get that formula.

yanivh · April 25, 2021, 3:50am

ADAM expands on the idea of adding momentum to the optimization process. Its formulation, however, does not suggest it falls to simple gradient descent with the betas set to 0. As @nramon mentioned, the simple momentum does reduces to simple gradient descent with its beta set to 0

rajsura82 · May 6, 2021, 8:24am

Thanks. Your intuition makes sense.

Topic		Replies	Views
DLS C2W3 Momentum vs Adam Beta Hyperparameters Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	518	April 11, 2023
C2W2 - Adam Optimization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	524	April 3, 2023
Adam vs RMSPROP, Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	564	January 8, 2023
Learning rate decay vs RMSprop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	734	February 4, 2023
Adam Optimization Question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	775	December 28, 2022

Adam Optimiztion

Related topics