Vanilla gradient descent: the gradient speaks about the direction of step, and speaks about the step size which is proportional to the errors (of predictions)

RMSProp: introducing adaptiveness to our vanilla gradient descent. It adds a denominator to suppress oscillation such that it can converge faster.

Adam: replacing the gradient with the momentum-based version of it in RMSProp, and as you said, it uses (exponentially) weighted average, and that makes it more reluctant to rapid change.

So, I agree with your Momentum, but what’s introduced in RMSProp should be about anti-oscillation, and error (of prediction) is already taken care of by the gradient.

Perhaps your “error” was my “oscillation”, and so I have listed my version at the beginning and clearly differentiate between my “error” and my “oscillation”.