help,I have a question about RMSprop

As Professor NG said, we use the RMSprop algorithm to slow down the learning rate of more volatile dimensions. Therefore, I think that a dimensionless number should be multiplied by the gradient, as I wrote in the picture

Hi, @user280.

Sorry for the late reply :sweat:

I’m afraid I don’t know a formal justification for why it works, but Adadelta does address the unit mismatch issue. You may find 3.2. Idea 2: Correct Units with Hessian Approximation interesting.

Good luck with the specialization :slight_smile: