Why do you add epsilon in the denominator of the final updating step of Adam optimization?

I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?

As far as I can tell, this term was not included in the RMSprop lesson.

Screen Shot 2021-07-17 at 6.44.55 PM

Prof Ng explained that in the lectures. It is to avoid “divide by zero”.

Looks like I missed it. My apologies and thank you for the prompt response! (I suppose it’s obvious now why epsilon isn’t a major hyperparameter, haha)

It is there in RMSProp too, though there is slight difference. In RMSProp it is within the sqaure root in the denminator, while in Adam is it outside it.