Batch Norm at test time - why exponentially weighted average instead of simple average?

Rodolfo_Novarini · April 5, 2023, 5:24pm

I understand the momentum value brought by the exponentially weighted average when sequentially adding pieces of information and giving more weight to the previously processed information as a whole versus to the last piece of information being considered.

But in the case of selecting a mean and variance to normalize the test examples the random sequence of batches used during training does not seem to be relevant. In fact in this case the exponentially weighted average is penalizing for no reason the batches that randomly resulted to be the last ones.

Could a regular arithmetic average of the means and variances of all mini-batches be a better hyper-parameter to be used at test time (and at production time)?

balaji.ambresh · April 5, 2023, 7:08pm

Training and test sets should come from the same distribution. You’ll learn more about this in the 3rd course of the specialization.

Model parameters should be learnt at train time based on your choice of hyperparameters.
Test dataset is meant for evaluating model performance. You cannot change model configuration post training and just before evaluating it.

Topic		Replies	Views
Batch norm at test time c2w3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	565	November 2, 2021
Batch norm at test time Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	562	July 16, 2021
Batch Norm At Test Time Clarification Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	411	July 27, 2023
A question about batch normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	400	September 5, 2023
BatchNorm at Test set (choose mean and var) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	593	June 25, 2021

Batch Norm at test time - why exponentially weighted average instead of simple average?

Related topics