[Week 2] what is the meaning of (axis = 3) in the BatchNormalization?

garrofederico · April 15, 2021, 2:15pm

Why are we normalizing only in along the channels and not along height and width, for example?

I’m having a hard time mapping the concepts from the Lectures from Pf. Andrew Ng (where I couldn’t grasp any reference to the channels or dimensions of the images) and the application on TensorFlow code.

Can anybody shed a light on this :)?

Many thanks in advance!

XpRienzo · April 15, 2021, 3:12pm

So when you Batch Normalise along the channels, what you do is make sure the inputs of the red channel, the blue channel and the green channel are normalised with respect to the batch.

Lets consider an image (Sorry about my drawing skills), a simple 40x40 monochromatic image.

I think it is visible that it is somewhat of a bird.

To simulate using a transformation along the height (would work similarly along the width), I’ve inverted colours of every other column

Pretty hard to tell this was a bird right? And remember this that every other column of this had the same transform happen to it, if we batch normalise along the height, every column will have a different transform apply to them, which will remove even more information than this one had and make it harder for our model to get information from. There’s no need to take information out of our data.

Now lets see what happens when we apply the same transformation along the same image.

As you can see, we still have the entire image data intact with no losses.

Channels are exactly as a monochromatic image, so when we apply the batch normalisation along the channel, it preserves our data while normalising it. Remember, Batch Normalisation is just a transform (much like inversion I did here) so using it per channel will not cause loss of data, unlike normalising along height or width.

Hopefully this helped you visualise the entire thing . If not feel free to ask clarifications.

garrofederico · April 19, 2021, 1:27pm

OK, So now I get it! we are supposed to normalize the whole image (height and width) on a channel basis and the way to that is to normalize the height and width on all the channels, and that’s what axis = 3 means.

Many thanks to both of you for taking the time to deliver explanations this detailed!

reinoudbosch · April 23, 2021, 5:54pm

Hi Federico,

Excellent question.

Take as an example the happyModel in Assignment 2 of week 1 (Convolutional_Model_Application). This model is constituted as follows:

ZEROPAD2D → CONV2D → BATCHNORM → RELU → MAXPOOL → FLATTEN → DENSE

The CONV2D layer serves to extract features from the padded image by means of one filter per feature. The application of a filter outputs a 2D array with activation values that support the extraction of a particular feature. These are the values you want to normalize, i.e. per feature, so that the parameters can be learnt faster.

As you may recall from the videos, the 2D arrays per filter/feature are stacked along the channels. So the number of channels equals the number of filters/features. In order to normalize values per feature, you want therefore to normalize along the channel. This is why batchnorm is applied per layer in the channel, i.e. along axis 3.

Hope this helps.

kelvinn · June 26, 2021, 5:14pm

Thanks for the explanation reinoudbosch. Is there any reference that I can read a little bit more about this? Is this a standard way to use batch normalization in CNN? I still a bit confused. I recall that in C2, we normalize Z^(i) to Zhat^(i) across samples in the current batch. Now from what i’ve learned, since Z^(i) is a 2D matrix, this means that we normalize for each components of Z^(i). So the restriction about normalization foe each channels/features seems a bit different from what have taught in C2. Thank you.

reinoudbosch · June 26, 2021, 5:38pm

Hi kelvinn,

How to use batch normalization depends on the specific purpose you want it to fulfill. In the case of using filters, you want to extract features, so this is what the implementation of batch norm should optimize. For other networks, this may differ. For a general introduction to batch normalization with references to interesting papers you can have a look here.

kelvinn · June 26, 2021, 5:53pm

Thank you very much for the fast response. I’ll try to look at the reference.

Topic		Replies	Views
TF batch norm for CNNs question Convolutional Neural Networks coursera-platform	5	483	May 24, 2023
Course 4, Week 1, Assignment 2: Why is batch normalization applied (only) to axis 3? Convolutional Neural Networks coursera-platform	2	533	May 10, 2022
Week 1 Assignment 2 - BatchNormalization Axis=3 Convolutional Neural Networks week-module-1 , coursera-platform	1	117	April 29, 2024
Course 4 Week 2 Assignment 1: Axis =3 Convolutional Neural Networks coursera-platform	1	495	March 18, 2023
C4,W1,A2 E1 BatchNormalization for axis 3 Convolutional Neural Networks coursera-platform	5	549	November 29, 2022

[Week 2] what is the meaning of (axis = 3) in the BatchNormalization?

Related topics