Hi, I’m not quite getting why do we need Batch Normalization at Test time. I understand that BN is a technique for training our NN more effectively. But isn’t testing imply just giving our test examples to the trained (on a Training set) NN to see how well it is doing on a Test set? If yes, why do we need BN for testing?
You’ll need batch normalization during training as well as during inference/testing.
The reason is that the inputs to the (batch-normalized) layers are changed during batch normalization, and so the neurons of those layers are trained on that “changed” inputs. Those layers expect its inputs to be normalized, whether during training or during inference/testing.
This is similar in concept to applying transformations (or normalization) to your feature data - you’ll need to transform (normalize) the feature data during training as well as during inference.
The only difference between normalization of feature data and batch normalization is that the former (feature normalization) normalizes the data before input into the model whereas the latter (batch normalization) occurs at layers within the model. In both cases, you’ll need to apply the normalization/transformation during both training and inference/testing.
Ok, now it makes sense. So whenever we use BN during training, we can’t use input data for inference as is, we should always apply BN to the input, right?
Also, in the lesson it is mentioned that we take means to calculate BN using exponentially weighted average across mini-batches. Does it mean that any time take we take different / alter a training set, we should calculate another mean value, that will be used to calculate BN during inference?
Um… you basically apply the same normalizations across the board, whether during training or during inference/testing.
If you applied feature normalization during training, you must apply it again during inference/testing. If you did not apply it during training, then you must NOT apply it during inference/testing.
If you applied batch normalization during training, you must apply it again during inference/training. If you did not apply it during training, then you must NOT apply it during inference/testing.
In many cases, yes, you will need to recalculate/update the batch normalization mean if you’re training on new data or more data. However, this is a design decision to be made by the AI engineer.