How batch normalization help us to hyperparameter search?

Oscar_Guarnizo · May 23, 2022, 9:57pm

In week3, video 4: Normalizing Activations in a Network. In the beginning, it mentions the following:

Batch normalization makes your hyperparameter search problem much easier, makes your neural network much more robust.

However, I don’t completely get how batch normalization helps us to hyperparameter search? It is just because the training process is faster or it is another thing. Please, someone could elaborate on this. I will really appreciate it.

anon57530071 · May 24, 2022, 4:39am

Let’s think about the input data. If input data shape is quite “unbalanced”, like x_1 =[0,1] and x_2=[-100000,100000], we usually make it “normalized” to have a similar distributions for those. We also shuffle input data to lower the chance of bias. Otherwise, the next step is quite unstable with having much influence from x2 value or bias data.
This is true for hidden units in a neural network. Let’s consider that there are 2 hidden layers. The input to the 2nd hidden layer is, of course, the output from the first hidden layer. If this output is quite “unbalanced”, then, the operation in the 2nd hidden layer units become unstable. This makes us difficult to get the best hyper-parameters easily and quickly. To avoid that situation, we “normalize” the output values of previous hidden units with a mean 0 and a variance 1. Then, with having a normalized input, the 2nd hidden layer gets stable. (From an algorithm view point, we do not use this normalized value as is. In stead, the authors of original paper introduced \alpha and \beta to shift and scale normalized data.)
If you watch the another video “Why does Batch Norm work?”, you may find additional insights with “covariate shift”, which has different input distributions.

Personally, I like BatchNorm to put into my neural network.

Elemento · May 24, 2022, 5:00am

Hey @Oscar_Guarnizo,
Welcome to the community. Allow me to add on to what Nobu has answered. Consider any simple 2-layer neural network (NN), and try to answer the question, what are the different hyper-parameters in a NN that we can try to search. A lot comes to mind, some of them being:

Number of layers
Number of neurons in each layer
Activation functions
Different types of initialization of weights

Now, let’s say that we don’t normalize our data, and find the train set error to be 20%. In this case, we don’t know whether we should associate this high error with the inherent differences in the inputs (for instance, different magnitude scaling as defined by Nobu) or to the fact that our NN has only a few layers, and hence, is a very simple model, and more layers need to be added, or one of the other 100 ways.

Therefore, normalizing your inputs is a great way to ensure that the model’s performance doesn’t depend on the inherent differences of the inputs, and this is how, Batch Normalization (BN) makes the problem of hyper-parameter search easier.

And if you are wondering why not simple normalization works, then as Nobu mentioned, BN has scaling and shifting parameters for each of the layers, which are a great way to ensure that the data’s distribution is reserved. Other advantages of BN can be seen in the lecture video mentioned by Nobu. I hope this helps.

Regards,
Elemento

Oscar_Guarnizo · May 24, 2022, 2:51pm

Thank you so much @anon57530071. This explanation was really great.

Oscar_Guarnizo · May 24, 2022, 2:52pm

Thank you so much @Elemento. I think that you complement perfectly the explanation of @anon57530071. It is really helpful and now I understand.

Topic		Replies	Views
Batch Normalization vs Feature Input Normalization Improving Deep Neural Networks: Hyperparameter tun	3	638	May 24, 2021
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun	6	592	October 26, 2021
Batch norm usage understand Improving Deep Neural Networks: Hyperparameter tun	1	576	April 30, 2022
Batch normalization vs regularization Improving Deep Neural Networks: Hyperparameter tun	1	605	April 30, 2022
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun	1	572	November 22, 2022

How batch normalization help us to hyperparameter search?

Related topics