Checking Intuition: RMSprop Normalization vs Speed Improvement (Post: RMSprop lecture)

b0otable · October 5, 2022, 11:44am

RMSprop was described as a way to speed up Gradient Descent.

However, the example describes seems to be a special case.

RMSprop as described seems to have the effect scaling the magnitude of gradient descent steps to be closer to each other in each dimension. More of a normalization approach.

Andrew picked a case where:
Magnitude of B was relatively high and more incorrect
Magnitude of W was relatively low and more correct

By normalizing:
The higher magnitude of B becomes relatively smaller (better)
The lower magnitude of W becomes relatively larger (better)

If we then increase the learning rate we are improving performance

It seems like we could easily make a counter case:

In this example:
Magnitude of B was relatively low and more incorrect
Magnitude of W was relatively high and more correct

In this case it seems we would:
The higher magnitude of B becomes relatively larger (worse)
The lower magnitude of W becomes relatively smaller (worse)

Then if we increase the learning rate, we are actually decreasing performance.

Is there any type of reason why one would expect relatively larger magnitudes in relatively incorrect directions as the case Andrew presented to be the norm?

If not, it would seem like RMSprop is good at helping to eliminate extreme incorrect magnitudes to help prevent overshooting via normalization? Wouldn’t this also have the disadvantage of eliminating extreme positive steps?

If that is the case, wouldn’t a momentum approach be a better overall solution. Instead of trying to ‘normalize’ step sizes (both good and bad), it seems like momentum actually tries to eliminate incorrect steps magnitudes and only leave steps in the relatively correct direction?

Once we are pointed in the correct direction it seems like momentum would allow us to crank up the learning rate and have the most advantageous results.

reinoudbosch · October 10, 2022, 1:24pm

Hi b0otable,

Maybe this explanation clarifies the effects of using RMSprop instead of momentum.

tldr: RMSprop escapes saddle points easier and faster than momentum.

Topic		Replies	Views
Question about RMSprop Improving Deep Neural Networks: Hyperparameter tun	1	275	December 17, 2023
RMSprop can go wrong? Improving Deep Neural Networks: Hyperparameter tun	4	716	April 29, 2023
RMS prop in a favorable setting Improving Deep Neural Networks: Hyperparameter tun	11	780	September 11, 2021
RMS Prop vs GD With Momentum Improving Deep Neural Networks: Hyperparameter tun	5	556	May 24, 2021
RMSprop in weight update - what if vertical slopes small and horizontal slopes large? Improving Deep Neural Networks: Hyperparameter tun	2	604	September 19, 2021

Checking Intuition: RMSprop Normalization vs Speed Improvement (Post: RMSprop lecture)

Then if we increase the learning rate, we are actually decreasing performance.

Related topics