When applying batch normalization to the hidden layers in a mini batch, we use the mean and variances of that particular mini-batch only. But I am confused about the normalization of the input layer (X), should we normalize the input layer before creating the mini-batches by using the mean and variance of the whole training set or normalize the input layer after creating the mini- batches by using the mean and variance of that particular mini-batch just like we do for the hidden layers?

Hope you got my point.

In practice, we normalize inputs before feeding to the NN using using the mean and standard deviation of the training set. Batching is done after normalizing data.

The mean and standard deviation calculated using the training set are used for normalizing the validation and test sets.

Hi Avinash,

Like Balaji mentioned, you should always normalize input data before you feed it into a neural network (with certain rare exceptions). Whether you apply batch normalization in your neural networks is a secondary thing. Even if you do apply batch normalization, they are not equivalent, as in batch normalization you are just focusing on the data in the batch. A data normalized per batch/mini-batch is very different from data normalized as a whole.

To understand this more, you need to understand the underlying objective of the 2 operations. Normalization is focused on bringing different features to a common scale while maintaining the relationships between them while batch-normalization focusses on stabilizing the learning process. Batch normalization does not fulfil the objective achieved by normalization.

Thanks for the clarification, I get your point now.