Dear Mentor,

Could you please guide me to understand this statement taken from the lecture?

“similar to dropout, batch norm therefore has a slight regularization effect. Because by adding noise to the hidden units, it’s forcing the downstream hidden units not to rely too much on any one hidden unit.”

Specific time in the lecture : 9:39 / 11:39

Why does Batch Norm work? | Coursera

I understand that dropout adds noises to hidden units by turning off the units completely with some probability so that downstream hidden unit don’t have chance to rely on it.

May i have any example on “downstream hidden unit not to rely too much on any one hidden unit” for Batch Norm?

Thank you

I think more than similarity prof.Ng meant batch norm does have a slight regularization effect like dropout.

When we have computed mean and standard deviation from a mini-batch, not from the entire data. So at every layer, we are adding noise and noise has a non-zero mean and non-unit variance, and is generated at random for each layer. It is then added after the batch normalization layers to deliberately introduce a covariate shift into activation, it acts as a regularizer. So with less information network has to predict the right labels. And also makes the model more robust.

the idea of adding dropout came in to picture when training one deep neural network with large parameters on the data might lead to overfitting.

But ensembles of neural networks with different model configurations are known to reduce overfitting, but require the additional computational expense of training and maintaining multiple models.

That is when drop out came into picture. A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training. Thereby provides very computationally cheap and effective regularization method to reduce overfitting and improve generalization error.

Batch normalization regularizes gradient from distraction to outliers and flows towards the common goal (by normalizing them) within a range of the mini-batch. Resulting in the acceleration of the learning process.

A dropout is an approach to regularization in neural networks which helps to reduce interdependent learning within these neurons.

Hey @JJaassoonn,

As @Deepti_Prasad said i will add one more example for you about batch norm.

Imagine a neural network with multiple hidden layers. Each hidden layer consists of several hidden units. **Without batch normalization**, the output of each hidden unit in a layer can vary significantly depending on the input data. This can lead to some hidden units becoming dominant and having a stronger influence on the downstream hidden units.

However, when batch normalization is applied, it normalizes the output of each hidden unit by subtracting the mean and dividing by the standard deviation of the batch. This normalization process introduces some noise or randomness to the hidden units’ outputs. As a result, the downstream hidden units are less likely to rely too heavily on any one specific hidden unit because the noise introduced by batch normalization prevents the dominance of any single unit.

So that’s a simple example how batch norm work and how the output from this process is similar to dropout here where “similar to dropout, batch norm therefore has a slight regularization effect.” comes from.

I hope it make sense now and feel free to ask for more clarifications.

Regards,

Jamal

Dear Ms Deepti_Prasad,

Thank you so much for your guidance.

Dear Mr Ahmed Gamal,

Thank you so much for your guidance.