Is Batchnorm really necessary?

djdevilliers · July 11, 2022, 5:29am

Without batchnorm the deeper layers will learn W and B without outside intervention (we don’t tell the network what the weights should be).

I understand the purpose of batchnorm is to give deeper layers of the network more normalized inputs (the output of shallower layers). But my intuition tells me that we’re starting to interfere with the network internals: instead of learning W and b independently, the network now has to learn W’ and b’ as a result of us meddling with the way it is learning. The result is no different, just that W’ <> W and b’ <> b, so what’s the point?

Thunderstroke · July 11, 2022, 7:33am

Hi, djdevilliers. I just saw your question and it’s was the same Q what I have.
I enrolled this course and keep learning. I got same question after take lecture of “Batch Norm” but now I am little bit sure about why it works.

First, we have learned about Input Normalization. The goal of the Input normalization is scaling all the input features to the same or similarly range, right? I think you may understood what the input normalization and why it works well. Andrew attached very intuition figure to understand that one. Once you understand why the Input Normalization works, it will be easy to understand BN(Batch Norm). From perspective of the lth hidden layer, the output of previous layer (l-1) would be think as INPUT.

Second, in the lecture Andrew bit compare the BN and dropout. Dropout is useful method to spread out weights and prevent rely on any one feature. I thought BN has little bit same feature with dropout. Once you compute output of l-1 layer, there may be some extra large value relatively others. In this case, the output of l layer may be depend on those large feature and it will cause overfitting. Once we done BN on output of l-1 layer, we can solve this problem and spread out weights.

Above ones are just my thought and understood from lecture.
If there is any mistake and you disagree with my opinion, please let me know your think.
Hope to hear your idea soon.
Thanks

rmwkwok · July 11, 2022, 12:04pm

Hello @djdevilliers, this category is for the Machine Learning Specialization. Are you a learner from the Deep Learning Specialization asking about DLS course materials in course 2 week 3?

djdevilliers · July 11, 2022, 2:10pm

Yes, apologies if I posted in the wrong place. The UI made it very difficult for me to post in the correct place.

rmwkwok · July 11, 2022, 2:14pm

It’s fine @djdevilliers, please don’t worry about it. I have moved this thread to the DLS category.

SamReiswig · July 11, 2022, 7:46pm

Hi!

The paper for Batch Normalization [1502.03167] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift is a pretty good read.
And some papers analyzing the results of the first paper: https://proceedings.neurips.cc/paper/2018/file/36072923bfc3cf47745d704feb489480-Paper.pdf
https://proceedings.neurips.cc/paper/2018/file/905056c1ac1dad141560467e0a99e1cf-Paper.pdf

TLDR: the main argument for using Batch Normalization is it allows us to use a larger learning rate without losing accuracy.

There are also alternatives to Batch Normalization such as Group Normalization.
The paper for Group Norm: [1803.08494] Group Normalization
And I like Yannic Kilcher’s explanation of Batch and Group Normalization on YouTube

Hope this helps!

djdevilliers · July 12, 2022, 4:08am

Thank you for the links; useful readings.

Topic		Replies	Views
Why does batch norm work? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	623	September 2, 2023
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	576	November 22, 2022
Batch Normalization Intuition questions Improving Deep Neural Networks: Hyperparameter tun week-module-3 , coursera-platform	8	52	July 19, 2024
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	608	October 26, 2021
A doubt batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	412	September 2, 2023

Is Batchnorm really necessary?

Related topics