W = W - lr* dW / np.sqrt(dW**2)
and b = b - lr* db / np.sqrt(db**2)
In these equation, if all the operations are element-wise i.e. if sqrt, square, and division are all element-wise then the vector dW / np.sqrt(dW**2) will be a vector of just 1’s and -1’s. Then we aren’t really going in the direction of dW or dB at all. This seems like a weird thing. Is my calculation correct ? If Yes, then why even bother to compute dW / np.sqrt(dW**2) but instead just use np.sign(dW) vector.
Right… this makes it a little less confusing. BUT SdW is still a moving average of dW**2 from the previous iterations. So the values of dW / np.sqrt(SdW) should still be close to np.signum(dW). Correct ?
I guess my question now is what is the intuition behind RMSProp. I understand the intuition behind momentum but not behind RMSProp.
The course dedicated a video on that in Course 2 Week 2 , I would suggest you to watch it (again). After that, if you still have questions, perhaps it would be helpful if you share your understandings after watching the video, and then we can discuss from there.