Hi, I have a question about batch normalization in train set vs. test set. Why we estimate the mean and variance differently? What is the reason to estimate exponentially weighted average in testing time? Why the usual mean and variance would not work?
Thanks.
Hi @mhyash
The main reason for estimating mean and variance differently during training and testing is to ensure that the batch normalization layer can generalize well to unseen data.
We estimate them differently to account for the changing statistics within mini-batches during training and to ensure that during testing, the normalization is consistent and not dependent on the specific mini-batch.
Exponentially weighted averages provide a more stable estimate of mean and variance over the course of training, and they help maintain the benefits of batch normalization during testing when we don’t have mini-batches.
Using the usual mean and variance computed on a single mini-batch or the entire training dataset may not be robust enough because neural networks can be sensitive to changes in these statistics. The exponential moving averages help smooth out these statistics and make the normalization more consistent and reliable during training and testing.
I hope this help.
Best regards
elirod
I understand now. Thank you for your clear explanation!
It was my pleasure