Optimization algorithms

Ibrahim_Mustafa · April 8, 2023, 2:03am

i reaaly did’t understand what is the diffrence between momentum and RMSprob
it seams to me that both do the same thing

Elemento · April 8, 2023, 6:35am

Hey @Ibrahim_Mustafa,
All the optimization algorithms have the same goal, i.e., to converge to the minimum value as fast as possible. The algorithm which will work best in your case, will depend on your application. As to the theoretical differences, I am sure Prof Andrew does a great job in explaining those, still let me highlight the key distinction.

While “Gradient Descent with Momentum” computes the moving average of the gradients, “RMSProp” on the other hand computes the moving average of the gradients squared, which the algorithms use further to update the weights and biases. I hope this resolves your query.

Cheers,
Elemento

Christian_Simonis · April 8, 2023, 10:18am

Hi @Ibrahim_Mustafa,

in addition to @Elemento‘s great reply, check out this exemplary viz:

Source

RMSprop applies an adaption based on squared gradients, see also the Keras Doc:

Maintain a moving (discounted) average of the square of gradients
Divide the gradient by the root of this average

Also momentum can by implemented using a moving average, but over the past gradients directly, see also this source.
Momentum can be imagined like „memorising“ the inertia to not get stuck in local minima but make it hopefully to the global optimum in theory. (In practice often you do not need to get to the global optimum as long as the model performance is robust and sufficient…)

If you want to see how RMSprop and momentum can be combined, check out this thread: Adam Optimization Question - #2 by Christian_Simonis

In summary also a bit with respect to the usual goals:

momentum accelerates your search in the direction of the global minimum by „using the inertia“ to make it over local minima
RMSProp is sort of preventing to search in the direction of oscillations since it punishes outliers stronger due to squaring gradients
ADAM combines the heuristics of both Momentum and RMSProp as pointed out in this nice article:
Source: Intro to optimization in deep learning: Momentum, RMSProp and Adam, see also this thread: Why not always use Adam optimizer - #2 by Christian_Simonis

Hope that helps!

Best regards
Christian

Topic		Replies	Views
RMS Prop vs GD With Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	562	May 24, 2021
Adam vs RMSPROP, Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	569	January 8, 2023
RMS prop in a favorable setting Improving Deep Neural Networks: Hyperparameter tun coursera-platform	11	784	September 11, 2021
GD with momentum versus ADAM Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	3	172	May 8, 2024
Intuition behind RMSprop, GD with moment and Adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	703	August 15, 2022

Optimization algorithms

Related topics