I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?

As far as I can tell, this term was not included in the RMSprop lesson.

I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?

As far as I can tell, this term was not included in the RMSprop lesson.

Prof Ng explained that in the lectures. It is to avoid “divide by zero”.

1 Like

Looks like I missed it. My apologies and thank you for the prompt response! (I suppose it’s obvious now why epsilon isn’t a *major* hyperparameter, haha)

1 Like