C4 W1 A2: Why is axis given as 3 for BatchNormalization

I have started my assignment on CNN (Convolution_model_Application) and while implementing the first function happyModel it is mentioned that BatchNormalization should be set to axis = 3. From the docs I understand that axis that is normalized should typically be features axis.

Here the shape of X_train is (600, 64, 64, 3) and when axis = 3 then it means it is referring to number of channels and not features. Then how is this correct? Please help me understand.

The Conv2D layer changes the shape of the data.
[‘Conv2D’, (None, 64, 64, 32), 4736, ‘valid’, ‘linear’, ‘GlorotUniform’]

So the 3rd axis is the 32 outputs of the Conv2D layer.

Thank you. Got it now.

I am still confused. After I used the happy_model.summary() I see that the number of parameters in the BN layer is 128!
However, according to lectures in previous courses, I think it was supposed to be 646432. BN was taught with respect to Fully connected NNs though.
The point is that, when used in CNNs, the BN is applied not only batch-wise but also along the channel axis. Why is that?