Hi there,
here is my take on this matter:
- momentum accelerates your search by „using the momentum“ to make it over local minima and do not get stuck here
- RMSProp is sort is preventing to search in the direction of oscillations.
-
Adam combines the heuristics of both Momentum and RMSProp as pointed out in this nice article:
Source: Intro to optimization in deep learning: Momentum, RMSProp and Adam
So different „cost spaces“ will have different numeric approaches to find an acceptable solution as fast as possible. I believe it’s fair so say that Adam is good to start with, but based on the performance within your optimization, you need to check if it’s is finally fulfilling your requirements based on your metrics, see also this thread for some discussion on KPIs to track and evaluate: Underfitting and Overfitting - #2 by Christian_Simonis
In general, I personally also had good experience with Adam as it possesses favourable characteristics as mentioned above.
Side Note: often saddle points can represent an issue in high dimensional spaces. If you are more interested, feel free to take a look at this paper from 2014: https://arxiv.org/pdf/1406.2572.pdf
Best regards
Christian