Course 4 Week 2 Assignment 1: Axis =3

X = BatchNormalization(axis = 3)(X, training=training)

For exercise 2 (convolution block), we use axis = 3 and I read from earlier posts it’s because we are performing batch normalization on channels.

It is because we consider every channel a mini-batch?

No normally a batch (or minibatch) includes many examples of the dataset, complete with all the channels for each example.

You would do batch normalization along a certain axis if the magnitude of the values along that particular axis change a lot, so that specific channel needs it.