An ambiguity about batch normalization at test time

S.hejazinezhad · December 13, 2022, 8:11am

Hi everybody,
I would be please if confirm my understanding about batch normalization at test time. As I understood, we keep track of mu and sigma squared per each mini-batch. Then, we use them to calculate Z_normalized and Z_tild in the following. For instance, if we have N mini-batches in layer L, we will compute Z_normalized and Z_tild N times for layer L. At the end, we have a vector of Z_tild for various mini-bathces. Am I right?
Best Regards,

Elemento · December 13, 2022, 12:32pm

Hey @S.hejazinezhad,
There are some very small gaps in your understanding. Let me try to fill those. Just to make sure that we are following the same underlying reference, I will be using the C2 W3 lectures.

Allow me to start from training time, since both “training” and “test” times are inter-linked. So, there are 4 important elements in Batch Normalization \gamma, \beta, \mu and \sigma^2. Now, \gamma and \beta are learnable parameters, so, they behave like any other “weight” during the training and test times, i.e., they get updated via back-propagation during training, and are fixed during testing. So, let’s keep them aside. Additionally, z_{norm} and z_{(i)} can easily be determined once we have these 4 elements using the equations described in the lecture, so I am keeping them aside as well.

Now, the only elements remaining are \mu and \sigma^2. During training, we compute them for each of the mini-batches, and use them to determine the z(s). However, we can’t do the same for testing, due to 2 reasons:

First, during testing, we may not have well-defined batches of inputs.
Second, which is more important, is that the aim of BN is to normalize the test inputs so that they resemble more closely to the training inputs. But if we use compute the distribution statistics using the test-set samples only, don’t you think, it would simply defeat the purpose of using BN in the first place?

So, for testing, what we do is that we compute the running/moving averages of \mu and \sigma^2 when they are computed during training for each of the mini-batches, and then, we use those estimates during the test time.

Let me know if this helps.

Cheers,
Elemento

Topic		Replies	Views
Week 3: Batch-Normalization confusion Improving Deep Neural Networks: Hyperparameter tun	3	616	May 29, 2022
C2W3 quiz - understand answer to Question 8 Improving Deep Neural Networks: Hyperparameter tun	8	649	June 1, 2022
Reason for Batch normalization at Test Time Improving Deep Neural Networks: Hyperparameter tun ai-discussions	4	479	January 23, 2024
A question about batch norm at test time Improving Deep Neural Networks: Hyperparameter tun	3	790	September 30, 2023
Questions on batch normalization Improving Deep Neural Networks: Hyperparameter tun	3	366	September 27, 2023

An ambiguity about batch normalization at test time

Related topics