Weighted average batch norm

What is the equation of weighted average of the batch norm to compute the mean for the test example?

Does anyone know it?

I guess it

\mu_{final} = \mu_{pervious\; patch} *\beta + (1-\beta)*\mu_{last\; bach}

HI @Areeg_Fahad

IF the test Data is less than one batch it called Inference , we have a single sample, not a mini-batch. How do we obtain the mean and variance in that case? here in this image


when we train amodel we:

  • Calculate Mean and Variance :we save these values during training the model for each batch
  • Normalize
  • Scale and Shift
  • Moving Average: In addition, Batch Norm also keeps a running count of the Exponential Moving Average (EMA) of the mean and variance. During training, it simply calculates this EMA but does not do anything with it. At the end of training, it simply saves this value as part of the layer’s state, for use during the Inference phase.

Here is where the two Moving Average parameters come in — the ones that we calculated during training and saved with the model. We use those saved mean and variance values for the Batch Norm during Inference.

Ideally, during training, we could have calculated and saved the mean and variance for the full data. But that would be very expensive as we would have to keep values for the full dataset in memory during training. Instead, the Moving Average acts as a good proxy for the mean and variance of the data. It is much more efficient because the calculation is incremental — we have to remember only the most recent Moving Average.

Cheers,
Abdelrahman

So, we compute the mean and variance of each batch based on the previous batch, and then, after computing all batches, let’s say we end with the last mean, which is used as mean_0 for the first batch of next iteration.

@Areeg_Fahad

If I understand your question right when we train the model in each batch we calcaulate the mean \mu_{i} and the variance \sigma_{i} and update the variable \mu_{movi} according to

\mu_{movi} = \alpha \mu_{movi} + (1- \alpha )\mu_{i}

also the same with variance

\sigma_{movi} = \alpha \sigma_{movi} + (1- \alpha )\sigma_{i}

until the last batch and when we test or want to predict the new value we use these variables \sigma_{movi} & \mu_{movi} to normalize the test values

Cheers!
Abdelrahman

1 Like

Also if you have any question, feel free to ask it

Best regards!
Abdelrahman

I mean, we have iterations and batches; for each iteration, we go through all batches.
For instance, suppose we compute the \mu of the first iteration, as you mentioned.
What about the second iteration? Do we use the first iteration’s \mu as the initial of the second iteration? and compute \mu agian of second iteration

or we just compute \mu of last iteration

@Areeg_Fahad

At every iteration we recompute \mu_{i}, and \sigma_{i} (doesn’t update start from zero) because we normalize the values of Z which equal Z =W*X + B so at every iteration we should recompute \mu_{i}, and \sigma_{i} because weights(W & B) change at every iteration and that would affect on Z like this photo
image

But the values of the \mu_{movi}, and \sigma_{movi} is update after every iteration not start from zero like \mu_{i}, and \sigma_{i} using this equations

\mu_{movi} = \alpha \mu_{movi} + (1- \alpha )\mu_{i}

\sigma_{movi} = \alpha \sigma_{movi} + (1- \alpha )\sigma_{i}

Note the batch normalization layer have a rules about how to used as for examples it shouldn’t be after pooling layer , and the best place it be before activation layer when you built the model

Cheers,
Abdelrahman

1 Like