Isn’t Adam Optimization Algo just a modified version of Gradient Descent with dynamic learning rate adjustment? Or am I missing something here?
Adam uses momentum and an adaptive learning rate.
Hello Dhawal,
Adam is gradient descent-based. As for whether it is adjusting the “learning rate”, it really depends on how you define learning rate in this context. In our gradient descent, the change of weight is a learning rate \alpha times the gradient, but in Adam, we also have an \alpha value which is always fixed, but what’s multiplied to this \alpha is a variable not just of gradient but also the momentums. You may check out the paper for the complete algorithm and you can see what’s happening there.
Raymond