Clarification about AMSgrad parameter in Adam

Despite the discussed hyperparameters of the Adam optimizer, I see there’s another main parameter called AMSgrad.

Can you please explain in simple terms, what this parameter does?

Here have a look at this page:

Amsgrad — A variant of Adam using the maximum of past square gradients | by neuralthreads | Medium.

1 Like