Video: Normalizing activations in a network

Question: What is the purpose of adding epsilon when performing batch normalization?

Video: Normalizing activations in a network

Question: What is the purpose of adding epsilon when performing batch normalization?

Can you post a screen capture image that shows the issue?

Prof. Andrew already answered this question in the same video you mentioned. Time 4:12. His words:

For numerical stability, we usually add epsilon to the denominator like that just in case sigma squared turns out to be zero in some estimate.

Epsilon is a very small value to avoid the division by zero.