RMSprop(C2W2L07)

j4t123 · March 11, 2022, 10:57am

In this equation, as I understand, large dw means slow updating speed.
Because large dw makes s_dw large, and s_dw is in denominator term in second equation.

But, as I know, large dw mean w should be updated.
I just wonder why RMSprop equation does it.

sorry for pool english.

paulinpaloalto · March 11, 2022, 5:11pm

The problem RMSprop is trying to solve is to smooth out the gradient updates. A large value of dw means a fast update, right? Perhaps too fast and maybe not in the best direction because of the statistical behavior of the minibatches. So they are taking two approaches to smooth things out:

Using an EWA on the dw values.
Also using that scale factor in the denominator to reduce the magnitude of the update.

Topic		Replies	Views
Week 2 RMSprop intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	620	May 11, 2022
Intuition for RMS Prop Neural Networks and Deep Learning coursera-platform	3	550	February 19, 2023
RMSprop in weight update - what if vertical slopes small and horizontal slopes large? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	621	September 19, 2021
RMS Prop vs GD With Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	572	May 24, 2021
Question about RMSprop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	279	December 17, 2023

RMSprop(C2W2L07)

Related topics