Why not always use Adam optimizer

Christian_Simonis · December 22, 2022, 3:17pm

Hi there,

here is my take on this matter:

momentum accelerates your search by „using the momentum“ to make it over local minima and do not get stuck here
RMSProp is sort is preventing to search in the direction of oscillations.
Adam combines the heuristics of both Momentum and RMSProp as pointed out in this nice article:
Source: Intro to optimization in deep learning: Momentum, RMSProp and Adam

So different „cost spaces“ will have different numeric approaches to find an acceptable solution as fast as possible. I believe it’s fair so say that Adam is good to start with, but based on the performance within your optimization, you need to check if it’s is finally fulfilling your requirements based on your metrics, see also this thread for some discussion on KPIs to track and evaluate: Underfitting and Overfitting - #2 by Christian_Simonis

In general, I personally also had good experience with Adam as it possesses favourable characteristics as mentioned above.

Side Note: often saddle points can represent an issue in high dimensional spaces. If you are more interested, feel free to take a look at this paper from 2014: https://arxiv.org/pdf/1406.2572.pdf

Best regards
Christian

Topic		Replies	Views
Choosing between Momentum, RMSprop and Adam in real life Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	540	November 3, 2022
Difference between Rmsprop and ADAM Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	1139	April 17, 2023
Optimization algorithms Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	721	April 8, 2023
Adam vs RMSPROP, Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	564	January 8, 2023
GD with momentum versus ADAM Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	3	168	May 8, 2024

Why not always use Adam optimizer

Related topics