Why do we need to use the exponentially weighted average of mean and variance to make prediction in the test set? Why is there a need to recalculate Z during test time. Are we not only supposed to use the learned parameters w and b calculated during the training set to make predictions ?
If you didn’t apply the same transformations at test time, the values of the activations would probably be very different from those used to optimize the weights during training.
Does the above make sense?