Weighted average batch norm

Areeg_Fahad · January 12, 2023, 6:04pm

What is the equation of weighted average of the batch norm to compute the mean for the test example?

Does anyone know it?

I guess it

\mu_{final} = \mu_{pervious\; patch} *\beta + (1-\beta)*\mu_{last\; bach}

AbdElRhaman_Fakhry · January 12, 2023, 6:37pm

IF the test Data is less than one batch it called Inference , we have a single sample, not a mini-batch. How do we obtain the mean and variance in that case? here in this image

when we train amodel we:

Calculate Mean and Variance :we save these values during training the model for each batch
Normalize
Scale and Shift
Moving Average: In addition, Batch Norm also keeps a running count of the Exponential Moving Average (EMA) of the mean and variance. During training, it simply calculates this EMA but does not do anything with it. At the end of training, it simply saves this value as part of the layer’s state, for use during the Inference phase.

Here is where the two Moving Average parameters come in — the ones that we calculated during training and saved with the model. We use those saved mean and variance values for the Batch Norm during Inference.

Ideally, during training, we could have calculated and saved the mean and variance for the full data. But that would be very expensive as we would have to keep values for the full dataset in memory during training. Instead, the Moving Average acts as a good proxy for the mean and variance of the data. It is much more efficient because the calculation is incremental — we have to remember only the most recent Moving Average.

Cheers,
Abdelrahman

Areeg_Fahad · January 12, 2023, 7:07pm

So, we compute the mean and variance of each batch based on the previous batch, and then, after computing all batches, let’s say we end with the last mean, which is used as mean_0 for the first batch of next iteration.

AbdElRhaman_Fakhry · January 12, 2023, 7:41pm

@Areeg_Fahad

If I understand your question right when we train the model in each batch we calcaulate the mean \mu_{i} and the variance \sigma_{i} and update the variable \mu_{movi} according to

\mu_{movi} = \alpha \mu_{movi} + (1- \alpha )\mu_{i}

also the same with variance

\sigma_{movi} = \alpha \sigma_{movi} + (1- \alpha )\sigma_{i}

until the last batch and when we test or want to predict the new value we use these variables \sigma_{movi} & \mu_{movi} to normalize the test values

Cheers!
Abdelrahman

AbdElRhaman_Fakhry · January 12, 2023, 8:08pm

Also if you have any question, feel free to ask it

Best regards!
Abdelrahman

Areeg_Fahad · January 12, 2023, 8:29pm

I mean, we have iterations and batches; for each iteration, we go through all batches.
For instance, suppose we compute the \mu of the first iteration, as you mentioned.
What about the second iteration? Do we use the first iteration’s \mu as the initial of the second iteration? and compute \mu agian of second iteration

or we just compute \mu of last iteration

AbdElRhaman_Fakhry · January 12, 2023, 9:29pm

@Areeg_Fahad

At every iteration we recompute \mu_{i}, and \sigma_{i} (doesn’t update start from zero) because we normalize the values of Z which equal Z =W*X + B so at every iteration we should recompute \mu_{i}, and \sigma_{i} because weights(W & B) change at every iteration and that would affect on Z like this photo

But the values of the \mu_{movi}, and \sigma_{movi} is update after every iteration not start from zero like \mu_{i}, and \sigma_{i} using this equations

\mu_{movi} = \alpha \mu_{movi} + (1- \alpha )\mu_{i}

\sigma_{movi} = \alpha \sigma_{movi} + (1- \alpha )\sigma_{i}

Note the batch normalization layer have a rules about how to used as for examples it shouldn’t be after pooling layer , and the best place it be before activation layer when you built the model

Cheers,
Abdelrahman

Topic		Replies	Views
Batch norm at test time c2w3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	565	November 2, 2021
A question about batch norm at test time Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	831	September 30, 2023
BatchNorm at Test set (choose mean and var) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	593	June 25, 2021
A question about batch normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	400	September 5, 2023
Batch Normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	561	May 31, 2021

Weighted average batch norm

Related topics