A question about batch normalization

mhyash · September 5, 2023, 1:18pm

Hi, I have a question about batch normalization in train set vs. test set. Why we estimate the mean and variance differently? What is the reason to estimate exponentially weighted average in testing time? Why the usual mean and variance would not work?
Thanks.

elirod · September 5, 2023, 4:24pm

Hi @mhyash

The main reason for estimating mean and variance differently during training and testing is to ensure that the batch normalization layer can generalize well to unseen data.

We estimate them differently to account for the changing statistics within mini-batches during training and to ensure that during testing, the normalization is consistent and not dependent on the specific mini-batch.

Exponentially weighted averages provide a more stable estimate of mean and variance over the course of training, and they help maintain the benefits of batch normalization during testing when we don’t have mini-batches.

Using the usual mean and variance computed on a single mini-batch or the entire training dataset may not be robust enough because neural networks can be sensitive to changes in these statistics. The exponential moving averages help smooth out these statistics and make the normalization more consistent and reliable during training and testing.

I hope this help.

Best regards
elirod

mhyash · September 5, 2023, 6:26pm

I understand now. Thank you for your clear explanation!

elirod · September 5, 2023, 6:28pm

It was my pleasure

Topic		Replies	Views
Batch Norm At Test Time Clarification Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	398	July 27, 2023
Batch Norm at test time - why exponentially weighted average instead of simple average? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	499	April 5, 2023
Batch norm at test time c2w3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	554	November 2, 2021
Batch norm at test time Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	552	July 16, 2021
C2 W3 Batch Norm at Test Time Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	375	August 30, 2023

A question about batch normalization

Related topics