Video: Normalizing activations in a network
Question: What is the purpose of adding epsilon when performing batch normalization?
Video: Normalizing activations in a network
Question: What is the purpose of adding epsilon when performing batch normalization?
Can you post a screen capture image that shows the issue?
Prof. Andrew already answered this question in the same video you mentioned. Time 4:12. His words:
For numerical stability, we usually add epsilon to the denominator like that just in case sigma squared turns out to be zero in some estimate.
Epsilon is a very small value to avoid the division by zero.