Hello World,
Considering Adam uses both Momentum and RMSProp ideas in its implementation, why not always use Adam optimizer? In what scenarios would one use Momentum or RMSProp instead of Adam?
Hello World,
Considering Adam uses both Momentum and RMSProp ideas in its implementation, why not always use Adam optimizer? In what scenarios would one use Momentum or RMSProp instead of Adam?
Hi there,
here is my take on this matter:
So different „cost spaces“ will have different numeric approaches to find an acceptable solution as fast as possible. I believe it’s fair so say that Adam is good to start with, but based on the performance within your optimization, you need to check if it’s is finally fulfilling your requirements based on your metrics, see also this thread for some discussion on KPIs to track and evaluate: Underfitting and Overfitting - #2 by Christian_Simonis
In general, I personally also had good experience with Adam as it possesses favourable characteristics as mentioned above.
Side Note: often saddle points can represent an issue in high dimensional spaces. If you are more interested, feel free to take a look at this paper from 2014: https://arxiv.org/pdf/1406.2572.pdf
Best regards
Christian
Hi, in addition to @Christian_Simonis comments I would like to add some more about why Adam is not always the best solution.
In addition to 2)
If Adam does not converge well, AMSGrad might be worth a look, see also:
https://johnchenresearch.github.io/demon/
Here also some other algorithms are explained like QHM (Quasi-Hyperbolic Momentum) which decouples the momentum term from the current gradient when updating the weights which can also be beneficial!
Best regards
Christian