C2W3 Differential addition of epsilon in Batch Norm and RMSProp

jchia89 · July 31, 2021, 7:13am

I noticed that in RMSProp, epsilon is added to the square root of Sdw/Sdb whereas in Batch Norm, epsilon is added to the variance of Z^[l] before taking the square root of the sum, where in both cases, the role of the epsilon is to preserve numerical stability (to avoid dividing by zero). Why are the inclusions of epsilon in the 2 cases different? For Batch Norm, why can’t epsilon be added to the square root of the variance of Z^[l], instead of taking the square root of the sum of the variance and epsilon?

nramon · August 25, 2021, 10:41am

Hi, @jchia89.

Sorry for the late reply.

That would avoid division by zero too, but I think what’s important is being consistent, otherwise you can run into problems.

Good luck with the course

Topic		Replies	Views
C2W2: RMSprop has the epsilon term within the square root, while Adam optimization has it outside, why this difference? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	609	April 23, 2023
Course 2 week 2 question on the equation for Adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	647	September 7, 2021
Why do you add epsilon in the denominator of the final updating step of Adam optimization? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	586	April 23, 2023
RMSProp formula clarification Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	2	9	September 26, 2024
Z Norm Calculation Question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	306	December 6, 2023

C2W3 Differential addition of epsilon in Batch Norm and RMSProp

Related topics