Batch normalization for regularization

vaibhavoutat · November 15, 2022, 10:44am

Andrew said Batch normalization has a regularization effect.
It is true that the mean and SD calculated will have noise due to calculation over the minibatch size and not the entire examples.
This mean is subtracted from the current Z and SD is divided. Help me understand why does this have the same effect as dropout . As in dropout our neuron might get entirely dropped and the Z value entirely becomes 0. So it makes sense that the weights will get spread out as the model can’t rely on any single neuron. I can not see any such effect in normalization. Help me understand

Juan_Olano · November 15, 2022, 4:06pm

Hi @vaibhavoutat ,

Lets review BN:

Batches are randomly created.
On each batch, the BN multiplies each unit by a random value (the SD of the randomly-generated batch).
Also, the BN subtracts from each unit a random value (the mean of the randomly generated batch)

At the end, BN injects a noise, like Dropout does, that force each layer to learn to handle variations in its inputs.

What do you think about this?

Juan

alvaroramajo · November 15, 2022, 6:31pm

Hi, @vaibhavoutat !

Just as a quick note, check How Does Batch Normalization Help Optimization? paper. It refutes the initial explanation that BN reduced the internal covariate shift. Instead, they show that it

" […] makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training."

vaibhavoutat · November 24, 2022, 2:03pm

Let me know if I got this right. In dropuout we drop the node with some probability, this is the noise in dropuout which forces the network to spread out the weights.
In BN we are multiplying with some SD (which might not drop out the node entirely but it will certainly introduce some noise) and this helps the node to spread out the weights as it can’t totally rely on the node with noise.

Topic		Replies	Views
A doubt batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	413	September 2, 2023
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	578	November 22, 2022
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	608	October 26, 2021
Is Batchnorm really necessary? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	615	July 12, 2022
Why does batch norm work? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	623	September 2, 2023

Batch normalization for regularization

Related topics