Hi Everyone,

I have a question regarding the intuition behind RMSprop,

As shown in the lecture video of Deep Learning specialization by Prof. Andrew Ng, RMSprop helps to reduce the oscillation (the values of the vertical slope b as in the example figure), and speed up the convergence into the minima point through stepping long horizontal axis,

This is achieved by update our weights as:

w:= w - \frac{d_{w}}{\sqrt{S_{dw}}}

b:= b - \frac{d_{b}}{\sqrt{S_{db}}}

So, if initially W is small so \sqrt{S_{dw}} is small, then W will take larger step (moving forward in horizontal direction) and b is large \sqrt{S_{db}} is large, then b will take much smaller step (moving forward in verticaldirection).

However, what if W is large and b is small? Then the optimization algorithm will become strongly fluctuating or diverging again?