Why batch norm works?

Celsa_Pardo_Araujo · November 15, 2021, 11:47pm

I have seen other topics here about how works the batch norm but I do not really see the answer of my question. So I have the following question:

I the video Andrew says that if we are in layer 2 then we will have that Z2,Z2 will change during the updates but if we use batch norm we have that their variance and mean will remain the same, which in the batch norm are the Beta[2] and gamma[2]. I do not understand this since the gamma and beta are also updated during the training which is the whole point of the batch norm. So then these mean and variance also change, maybe it can be seen that they do change but they change smoothly?

jonaslalin · November 17, 2021, 4:18pm

Yes. I think the key takeaway is

So from the perspective of the third hidden layer, these hidden unit values are changing all the time, and so it’s suffering from the problem of covariate shift that we talked about on the previous slide. So what batch norm does, is it reduces the amount that the distribution of these hidden unit values shifts around.

Topic		Replies	Views
Question about batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	574	April 26, 2023
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	575	November 22, 2022
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	598	October 26, 2021
Batch Norm and Covarient shift Improving Deep Neural Networks: Hyperparameter tun week-module-3 , coursera-platform	4	20	September 14, 2024
Batch Norm reducing internal covariate shift Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	381	September 26, 2023

Why batch norm works?

Related topics