Batch Normalization Intuition

EscapisGR · November 22, 2022, 3:33pm

Greetings everyone,

I’d like to ask few questions regarding the intuition on why Batch Normalization works, as the theory presented in classes is a bit unintuitive to me.

It’s not clear to me why the distribution of the Z values would change during training in a more ‘controlled’ way when using Batch norm, given that the β and γ parameters are also trained alongside (without knowing the effect of gradient descent on them). Why wouldn’t backprop cancel out the effect of batch norm using the β and γ parameters?
When I think of the analogy of the covariate shift in the image datasets (with black and colored cats), couldn’t we argue that during training time, a shift in the distribution of the features could favor better generalization? (in the same way training with both black and colored cats would)
Finally, it seems to me that a small batch size (e.g. 32 samples, typically used in many applications) would induce a significant amount of noise in the estimations of the mean/variance. Is there any reason why we wouldn’t use an exponentially weighted average estimation for both of these to calculate the Znorm during training? (as we do during testing). My intuition would be that by using these converging values systematically, we would derive a training that would better translate to the estimations derived during testing (mean/variance resulting from the EWA of all samples)

I’d be very interested to hear any of your thoughts!

Best regards,
Konstantinos

sonnh1902 · November 22, 2022, 5:14pm

Hi EscapisGR,

Welcome to DeepLearning.AI!!!
My thoughts on Batch Norm are that in a network with many layers, there’s an internal covariate shift for the input of a layer and this can make the size of the signals in the network inappropriate. We know that the output of that layer is going to go into the input of the next layer so we need our signals to be in a normalized range. Because of that, Batch Norm comes to the rescue. It helps balance the pre-activations and avoid gradient vanishing or explosion so we can train a deeper and deeper network. To my knowledge, Batch Norm also has an effect of regularization. That’s why we should use Batch Norm to stabilize our networks.
For more information on Batch Normalization, you can check out this URL below
http://d2l.ai/chapter_convolutional-modern/batch-norm.html

Best regards,
Son

Topic		Replies	Views
Batch Norm and Covarient shift Improving Deep Neural Networks: Hyperparameter tun week-3	4	20	September 14, 2024
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun	6	594	October 26, 2021
Question about batch norm Improving Deep Neural Networks: Hyperparameter tun	6	570	April 26, 2023
Why batch norm works? Improving Deep Neural Networks: Hyperparameter tun	1	513	November 17, 2021
Batch Normilization Improving Deep Neural Networks: Hyperparameter tun	3	504	April 17, 2023

Batch Normalization Intuition

Related topics