As I understand, Adam is a combination between Momentum and RMSprop. Andrew said that Adam is successfully used for a large variety of problems. My question is: Does this mean that in real life we will usually opt for Adam? (So that Momentum / RMSprop were only taught in order to help us inderstand the exponentially weighted moving average principle?)
Or, do we need to sometimes train with Momentum/ RMSprop? In which case, is there any recipe as to which optimization we want to use, or is it better to just try them all?
Hi @Doron_Modan,
I’m not sure in which week of C2 you are at currently, but there will be assignment where you’ll go through all of these and compare results.
Best,
Mubsi
I’m not sure this answers my question. I was asking about real life.
I’m not an actual practioner of ML/DL in an industrial setting, so I’m not really qualified to answer about “real life”. But I think your theory is right: in most of the examples we see in the rest of these courses, “Adam” is the preferred optimization method. But the other overall message about hyperparameter choices here in Course 2 is that there is not really a single “silver bullet” answer that always works best in all cases for most of the choices you have to make. So it never hurts to have more tools in your toolbox. You start with Adam and if that doesn’t work well enough, you consider the others.
Hi @Doron_Modan,
My apologise for not expanding on what I had said above, but Paul basically mentioned it already (Thanks Paul!).
What I meant was that you’ll do some comparisons and see what works best. Similarly, in real world, given the scenarios, you can train different models and see which ones satisfies your needs. Adam seems to be a popular choice no doubt, but others might outperform it given the task at hand.
Best,
Mubsi