C2W3 Differential addition of epsilon in Batch Norm and RMSProp

I noticed that in RMSProp, epsilon is added to the square root of Sdw/Sdb whereas in Batch Norm, epsilon is added to the variance of Z^[l] before taking the square root of the sum, where in both cases, the role of the epsilon is to preserve numerical stability (to avoid dividing by zero). Why are the inclusions of epsilon in the 2 cases different? For Batch Norm, why can’t epsilon be added to the square root of the variance of Z^[l], instead of taking the square root of the sum of the variance and epsilon?

Hi, @jchia89.

Sorry for the late reply.

That would avoid division by zero too, but I think what’s important is being consistent, otherwise you can run into problems.

Good luck with the course :slight_smile: