I have intuitive understanding of how RMSprop and ADAM works, but can someone give me more detailed and concentrate examples of how the working procedure of these two algorithms actually differ?
ADAM really takes both the RMSprop and the momentum and incorporate it into a single equation but what does that change?
Hi @Anish_Sarkar1
I believe your question is partially answered in this thread: Optimization algorithms - #3 by Christian_Simonis, see also this visualization for some visual difference on an exemplary optimization:
Adam uses the calculation of an exponentially filtered moving average, combining RMSProp and Momentum. In Adam we also have a bias correction. The purpose of bias correction in exponential filtering is to improve the smoothening of early values, see also this thread.
In general: dependent on how your data and the optimization cost space looks like, different algorithms would be particularly suitable.
To answer your question more specifically: Adam can be considered as combining the best out of RMSprop and momentum in a quite robust and adaptive algorithm, where often no much hyperparameters need to be tuned;
I believe it’s fair so say that Adam is good to start with, but based on the performance within your optimization, you need to check if it’s is finally fulfilling your requirements based on your metrics, see also this thread for some discussion on KPIs to track and evaluate: Underfitting and Overfitting - #2 by Christian_Simonis
But hands down: there are also cases where other algorithms can be better suited to solve your optimization, see also: Why not always use Adam optimizer - #4 by Christian_Simonis
Best regards
Christian
2 Likes