Batch Normalization on Channels

Ayse_Burcu_Ozkaptan · May 28, 2022, 11:32am

Hello everyone,
Why do we have to run the normalization on channels axis? If the input, say an RGB image, has 3 channels, wouldn’t it be a better approach to normalize the values on each channel separately (i.e. normalization across axes 1 and 2)? or am I getting it wrong?
Cheers,

balaji.ambresh · May 28, 2022, 11:59am

For each image that passes through a Conv2D layer, the following is the output dimension:[new_height, new_width, num_filters_in_conv]. The last dimension corresponds to channels. When you run batch norm across the channel axis, for every channel, batch norm tracks 4 variables: \beta, \gamma, \mu, \sigma. Please see this link.

anon57530071 · May 28, 2022, 12:39pm

I suppose Balaji explained key points already, but here are few additions.

BatchNormalization is to apply “normalization and scale/shift” to the output of previous layer in order to stabilize the neural network. On the other hand, filters which are used in Conv2D (and others) are trying to extract characteristics of image/text. Each filter creates one output, a channel. In this sense, as you know, the number of channels in the output is equal to the number of filters. There will be no RGB channel even from the first Conv2D function.
Then, the next question may be,… is there any advantages or meanings to handle a single RGB signal (say, R) separately ? I think, to extract characteristics from an image, combinations of RGB have more importance than a single channel data. Applying a single filter to all three channels and get one output (channel) should have more meaningful characteristics which can be utilized for image detections and others.

Ayse_Burcu_Ozkaptan · May 28, 2022, 1:01pm

Thank you very much for your response! However, this was not actually what I was looking for. I was wondering why we are normalizing across channels rather than the other dimensions of the convolution output, namely normalizing across the height and width?

anon57530071 · May 28, 2022, 1:53pm

This is actually an interesting question. I suppose there are some other research works to find the group to normalize. Batch norm, Layer norm, Instance Norm, Group Norm,…
Here is one figure from the paper: Yuxin Wu, Kaiming He. “Group Normalization”

In some cases, authors could get a better result with Group Normalization. If you are interested in, please take a look at this interesting paper. Hope this helps.

Topic		Replies	Views
TF batch norm for CNNs question Convolutional Neural Networks coursera-platform	5	477	May 24, 2023
[Week 2] what is the meaning of (axis = 3) in the BatchNormalization? Convolutional Neural Networks coursera-platform	6	1407	June 26, 2021
C4W2 Resnets - Why batchnormalization axis = 3 Convolutional Neural Networks coursera-platform	3	616	August 20, 2021
Course 4, Week 1, Assignment 2: Why is batch normalization applied (only) to axis 3? Convolutional Neural Networks coursera-platform	2	524	May 10, 2022
Week 1 Assignment 2 - BatchNormalization Axis=3 Convolutional Neural Networks week-module-1 , coursera-platform	1	115	April 29, 2024

Batch Normalization on Channels

Related topics