GD with momentum versus ADAM

anlafuente · May 8, 2024, 6:48am

Hi there!

When going through the optimization methods in week2 I wonder what would be the reasons to choose gradient descent with momentum over ADAM. The latter seems be an improvement of the former that outperforms it right? Is better memory usage the only advantage of the momentum method?

Thanks in advance
Antonio

gent.spah · May 8, 2024, 6:57am

No just memory it also adds momentum into consideration, so the algorithm can steer towards to the optima with better odds!

anlafuente · May 8, 2024, 11:35am

HI Gent

Thanks for your reply. Wouldn’t ADAM, being sort of a mix of RMSprop and momentum, do the same? And, if so, then what is the advantage of one over the other?

Alireza_Saei · May 8, 2024, 12:07pm

Hi Antonio

You’re correct!

ADAM is a powerful algorithm but in some cases gradient descent with momentum may be preferred:

Simplicity and ease of implementation compared to ADAM
More stable convergence behavior (especially in scenarios with noisy or sparse gradients)

Overall, the choice between these algorithms depends on factors such as computational resources, convergence characteristics, and some specific properties of the optimization problem. Let me know if you have any further questions!

Topic		Replies	Views
Optimization algorithms Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	723	April 8, 2023
Adam algorithm explanation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	567	June 24, 2021
Choosing between Momentum, RMSprop and Adam in real life Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	541	November 3, 2022
Why not always use Adam optimizer Structuring Machine Learning Projects coursera-platform	3	2121	December 23, 2022
Difference between Rmsprop and ADAM Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	1144	April 17, 2023

GD with momentum versus ADAM

Related topics