Regarding batch normalization

In week 1 assignment 2 in sequential layer we used BatchNormalization with argument axis=3, what exactly this axis=3 mean and why are we using batch normalization ?
Also after executing happy_model.summary() I found 64 non trainable parameters what are these 64 non trainable parameters?
Thanks!

Hi Nithin,

With regard to your first question have a look here: [Week 2] what is the meaning of (axis = 3) in the BatchNormalization? - #6 by reinoudbosch

The 64 non trainable parameters are the mean and variance vectors of the batchnorm layer. With 32 features (axis=3 of the conv2d layer output) you get 32*2 non trainable parameters. You can also have a look here: Keras - number of parameters in BatchNorm Layer

Thank for answering,
My second doubt got clarified but…
Can u tell me the difference between axis=2 and axis=3
Thanks!

Hi Nithin,

Axis=0 refers to the training examples. Axis=1 and 2 describe 2D arrays with activation values that support the extraction of particular features. Axis=3 is the axis along which the 2D arrays (one array per filter) are stacked. In order to normalize values per feature, you want therefore to normalize along axis 3.

My doubt is clarified
thank you!

wait why is it that in the sequential api we use batch normalization but we don’t in the functional api ?

It likely has to do with the problem that is being solved, rather than the method being used. Perhaps batch normalization wasn’t required in the exercise that used the functional API.