GD with momentum versus ADAM

Hi there!

When going through the optimization methods in week2 I wonder what would be the reasons to choose gradient descent with momentum over ADAM. The latter seems be an improvement of the former that outperforms it right? Is better memory usage the only advantage of the momentum method?

Thanks in advance

No just memory it also adds momentum into consideration, so the algorithm can steer towards to the optima with better odds!

1 Like

HI Gent

Thanks for your reply. Wouldn’t ADAM, being sort of a mix of RMSprop and momentum, do the same? And, if so, then what is the advantage of one over the other?

Hi Antonio

You’re correct!

ADAM is a powerful algorithm but in some cases gradient descent with momentum may be preferred:

  • Simplicity and ease of implementation compared to ADAM
  • More stable convergence behavior (especially in scenarios with noisy or sparse gradients)

Overall, the choice between these algorithms depends on factors such as computational resources, convergence characteristics, and some specific properties of the optimization problem. Let me know if you have any further questions!