Why do we run BatchNormalization after Conv2D?

moose · December 30, 2022, 1:54pm

In Week 1 assignment 2 we are asked to add a BatchNormalization layer after Conv2D. This was not introduced in lecture. When batch normalization was introduced several courses back we learned to normalize the input. Should we be normalizing at the input too? as well as after every convolution? What is common practice as learned in research/ and applications that I can read about?

Thank you

Elemento · December 30, 2022, 4:10pm

Hey @moose,
The key think to note here is that although both Batch Normalization and Input Normalization are normalizing, the reason behind the use of each of them differs to some extent.

Normalization of inputs is done in order to make sure that the optimization algorithms don’t have to make steep updates in one direction, and small updates in other, i.e., it is done to ensure that the updates of gradient descent are more or less uniform in each direction. The figure depicting a circular contour plot of the cost function may come to your mind from the course.

On the other hand, Batch Normalization is performed in order to make the updates independent of the “different distribution statistics” followed by each batch of the inputs. Since different batches of inputs might have different mean and variance, using Batch Normalization, we can ensure that all of the batches follow the same distribution, and we can use the learned statistics and the parameters to govern the distribution followed by the test samples as well (to some extent).

So, using both Input Normalization and Batch Normalization is not something that might hurt your model. If anything, chances are that your model’s convergence will be boosted.

Now, when to use Batch Normalization, for instance, after every Conv layer, after every alternate Conv layer, varies from models to models. For that, you can try to take a look at the famous model architectures, and see how they have used Batch Normalization throughout their architecture.

I hope this helps.

Cheers,
Elemento

paulinpaloalto · December 30, 2022, 8:31pm

Or maybe we can restate this question as just about the point of view you take: the point is that the output of a Conv layer is also the input to the next layer, right? So we’re not applying BatchNorm to the output of the Conv layer: we’re applying it to the input of the next Conv layer or whatever the next layer is. In other words, it’s just a question of perspective. Of course it’s all equivalent, but we just have to look at the goal in the right way to understand the point.

Elemento · December 31, 2022, 3:42am

Interesting insight @paulinpaloalto Sir, thanks for sharing it

Cheers,
Elemento

Topic		Replies	Views
CNN Batch Normalization Convolutional Neural Networks	1	530	April 29, 2022
Confusion with Input normalization and batch normalization Improving Deep Neural Networks: Hyperparameter tun	3	608	January 22, 2022
tf.keras.layers.BatchNormalization() Custom Models, Layers and Loss Functions with TF week-4	2	550	February 12, 2023
Batch Normalization vs Feature Input Normalization Improving Deep Neural Networks: Hyperparameter tun	3	639	May 24, 2021
Why always Batch Normalization? Convolutional Neural Networks	1	511	March 6, 2022

Why do we run BatchNormalization after Conv2D?

Related topics