I noticed that in RMSProp, epsilon is added to the square root of Sdw/Sdb whereas in Batch Norm, epsilon is added to the variance of Z^[l] before taking the square root of the sum, where in both cases, the role of the epsilon is to preserve numerical stability (to avoid dividing by zero). Why are the inclusions of epsilon in the 2 cases different? For Batch Norm, why can’t epsilon be added to the square root of the variance of Z^[l], instead of taking the square root of the sum of the variance and epsilon?