Why do you add epsilon in the denominator of the final updating step of Adam optimization?

jeffreywang · July 18, 2021, 1:45am

I might have missed something here… but why is there a +epsilon in the denominator of the final step of Adam optimization?

As far as I can tell, this term was not included in the RMSprop lesson.

Screen Shot 2021-07-17 at 6.44.55 PM

paulinpaloalto · July 18, 2021, 5:17am

Prof Ng explained that in the lectures. It is to avoid “divide by zero”.

jeffreywang · July 18, 2021, 5:18am

Looks like I missed it. My apologies and thank you for the prompt response! (I suppose it’s obvious now why epsilon isn’t a major hyperparameter, haha)

Sahil_Singh1 · April 23, 2023, 6:16am

It is there in RMSProp too, though there is slight difference. In RMSProp it is within the sqaure root in the denminator, while in Adam is it outside it.

Topic		Replies	Views
C2W2: RMSprop has the epsilon term within the square root, while Adam optimization has it outside, why this difference? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	619	April 23, 2023
Course 2 week 2 question on the equation for Adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	652	September 7, 2021
C2 W2: Improving Deep Neural Networks Week 2 Programming Assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	11	538	April 6, 2024
C2W3 Differential addition of epsilon in Batch Norm and RMSProp Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	539	August 25, 2021
DLS Course 2 week 2 Programming Assignment Optimization methods Exercise 6 update parameters with adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	656	August 11, 2022

Why do you add epsilon in the denominator of the final updating step of Adam optimization?

Related topics