Intuition for RMS Prop

In RMS Prop, we have the following update:

W = W - lr* dW / np.sqrt(dW**2)
b = b - lr* db / np.sqrt(db**2)

In these equation, if all the operations are element-wise i.e. if sqrt, square, and division are all element-wise then the vector dW / np.sqrt(dW**2) will be a vector of just 1’s and -1’s. Then we aren’t really going in the direction of dW or dB at all. This seems like a weird thing. Is my calculation correct ? If Yes, then why even bother to compute dW / np.sqrt(dW**2) but instead just use np.sign(dW) vector.

Hello @ultimateabhi,

From this video in Course 2 Week 2

It is divided by the square root of S_{dW} instead of dW. What do you think now?


Right… this makes it a little less confusing. BUT SdW is still a moving average of dW**2 from the previous iterations. So the values of dW / np.sqrt(SdW) should still be close to np.signum(dW). Correct ?

I guess my question now is what is the intuition behind RMSProp. I understand the intuition behind momentum but not behind RMSProp.

Hello @ultimateabhi

I cannot find a function called np.signum.

The course dedicated a video on that in Course 2 Week 2 , I would suggest you to watch it (again). After that, if you still have questions, perhaps it would be helpful if you share your understandings after watching the video, and then we can discuss from there.