Why do you add epsilon in the denominator of the final updating step of Adam optimization?

I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?

As far as I can tell, this term was not included in the RMSprop lesson.

Screen Shot 2021-07-17 at 6.44.55 PM

Prof Ng explained that in the lectures. It is to avoid “divide by zero”.

1 Like

Looks like I missed it. My apologies and thank you for the prompt response! (I suppose it’s obvious now why epsilon isn’t a major hyperparameter, haha)

1 Like

It is there in RMSProp too, though there is slight difference. In RMSProp it is within the sqaure root in the denminator, while in Adam is it outside it.