I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?
As far as I can tell, this term was not included in the RMSprop lesson.

I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?
As far as I can tell, this term was not included in the RMSprop lesson.

Prof Ng explained that in the lectures. It is to avoid “divide by zero”.
Looks like I missed it. My apologies and thank you for the prompt response! (I suppose it’s obvious now why epsilon isn’t a major hyperparameter, haha)