In this equation, as I understand, large dw means slow updating speed.
Because large dw makes s_dw large, and s_dw is in denominator term in second equation.
But, as I know, large dw mean w should be updated.
I just wonder why RMSprop equation does it.
sorry for pool english.
The problem RMSprop is trying to solve is to smooth out the gradient updates. A large value of dw means a fast update, right? Perhaps too fast and maybe not in the best direction because of the statistical behavior of the minibatches. So they are taking two approaches to smooth things out:
- Using an EWA on the dw values.
- Also using that scale factor in the denominator to reduce the magnitude of the update.