Residual_Networks - BN - Channel Axis

AntonioMaher · July 31, 2022, 12:20am

I appreciate help to understand the following:

In the Identity and Convolutional Block, Batch Normalization is used, but applied not as in the initial stage of the block but only after CONV2D. Why?

Thank you.

anon57530071 · July 31, 2022, 2:55am

This is interesting question, actually.

The reason why we include BatchNorm is to normalize “internal covariate shift” of outputs which can be seen in a deeper network, or just by a batch. And, if we look at the result, it worked effectively.

But, the location to put BatchNorm is really case-by-case. As a covariate shift can be seen in the deeper network, we may start to insert just after a big operation like Conv2D. But, a covariate shift also can be seen by different batch (mini-batch). So, we may want to insert more…
Then, test and get results for tuning.
Even in the case of the residual network, they tried to put at different locations. For example, a key decision is whether BatchNorm should be put before merging a shortcut or after… As the test result, they got a good result when they put BatchNorm before merging a shortcut.
Like this, everything is trial-and-error oriented.

And, more annoying thing is, “why Batchnorm works” is still a research theme. A recent paper, Understanding Batch Normalization raised some doubts about the original paper to discuss about internal covariate shift, and, another recent work by different researcher is High-Performance Large-Scale Image Recognition Without Normalization, which does not use BatchNorm. And, performance of this NFNet (Normalizer-Free Net) is better than recent EfficientNet/LambdaNet.

So, I should say there is no concrete guideline. Let’s try and select the best one.

AntonioMaher · July 31, 2022, 7:28am

Hi Nobu,

Thanks very much for your quick and clear answer, making me understand why I did not understand in the beginning! There is no actual reason why, just trial and error!

Much appreciated.

Have a great weekend.

Best regards,
Antonio

Topic		Replies	Views
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun	6	592	October 26, 2021
Why do we run BatchNormalization after Conv2D? Convolutional Neural Networks	3	613	December 31, 2022
DLS Course 4, 2nd Week, First assignment Convolutional Neural Networks	1	583	June 10, 2022
C4W2 Resnets - Why batchnormalization axis = 3 Convolutional Neural Networks	3	614	August 20, 2021
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun	1	572	November 22, 2022

Residual_Networks - BN - Channel Axis

Related topics