In C2W3 video " Tuning Process" Professor Ng mentions that the momentum term is second priority for tuning but he goes on to say that he almost never tunes the beta params for ADAM optimization. This is confusing to me because ADAM optimization is just combining momentum and RMSProp. In this context isn’t, beta1 the same as the momentum hyperparameter?
Hi Alan,
I will try to help you understanding these concepts. This is what I found for you:
- beta1. The exponential decay rate for the first moment estimates (e.g. 0.9).
- beta2. The exponential decay rate for the second-moment estimates (e.g. 0.999). This value should be set close to 1.0 on problems with a sparse gradient (e.g. NLP and computer vision problems).
Reference: https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
And the tuning effect: optimization - Deep Learning: How does beta_1 and beta_2 in the Adam Optimizer affect it's learning? - Cross Validated
Tuning both can be tricky to achieve better results so that is why maybe it is a better strategy to tune other parameters.
Happy learning,
Rosa
1 Like
Hi Rosa,
Thanks for your reply. Given your response and what was mentioned by Dr. Ng in the lecture, I will consider the above lecture slide incorrect and consider all beta params low priority for tuning.
Thanks