Course 4, Week 1, Assignment 2: Why is batch normalization applied (only) to axis 3?

In Assignment 2 of Week 1 using tensorflow sequential model (exercise 1), we are asked to use batch normalization to axis 3. Why is that?

Reading the documentation, axis should be used for features. In this case, wouldn’t the features be along two axes (1 and 2)?

And specifically, since inputs are 2 dimensional (for each channel), won’t there be two axes along which we would have to normalize it?

I am sure I am misunderstanding something here. Any help is appreciated.

Have you seen this page? Read the description of axis keeping in mind that in this assignment, channels are the last dimension.

Thank you Balaji.

Yes, I had seen that but working through the assignments it seems like the channel axis along which the features are stacked needs to be specified.

This is what confused me:
" axis Integer, the axis that should be normalized (typically the features axis). For instance, after a Conv2D layer with data_format="channels_first" , set axis=1 in BatchNormalization ."