Batch norm at test time

Why do we need to use the exponentially weighted average of mean and variance to make prediction in the test set? Why is there a need to recalculate Z during test time. Are we not only supposed to use the learned parameters w and b calculated during the training set to make predictions ?

If you didn’t apply the same transformations at test time, the values of the activations would probably be very different from those used to optimize the weights during training.

