As Professor NG said, we use the RMSprop algorithm to slow down the learning rate of more volatile dimensions. Therefore, I think that a dimensionless number should be multiplied by the gradient, as I wrote in the picture
As Professor NG said, we use the RMSprop algorithm to slow down the learning rate of more volatile dimensions. Therefore, I think that a dimensionless number should be multiplied by the gradient, as I wrote in the picture