Understanding BatchNormalization

Dyxuki · May 30, 2021, 6:14pm

Hello,
in the previous courses Andrew mentioned the importance to have normalized input data, it could speed up significately learning speed.
My question is, does BatchNormalization layers in keras do such operation?
In which case why don’t we always add a BatchNormalization layer just after the input layer?
Can anyone please clarify?

reinoudbosch · June 4, 2021, 2:26am

Hi Dyxuki,

A BatchNomalization layer also includes gamma and beta, which should not be applied to the input data. See, e.g., this Q&A.

Dyxuki · June 4, 2021, 8:30pm

Hello reinoudbosch
thank you very much for the answer.

However I’m still not totally convinced by that.
I read the Q&A, the paper, and had a glance at some other articles.
It appears that the learnable parameters ‘gamma’ and ‘beta’ in BN are here to re-introduce some biases and variance, to maintain the expressive power of the network.
So in this sense, yes gamma and beta ain’t useful for the input layer.

But even if not necessary, I think that having those extra parameters wouldn’t do much harm neither (or am I totally wrong?). Because if that works for any layers besides the input layer, why wouldn’t it work for the input layer?
From a mathemetical point of view the input layer x, or a[0] wouldn’t it be very similar to a[1], a[2], etc. ?

also, based on the tensorflow BN layer doc, it seems that it is possible to not use gamma and beta. They can be disabled by setting the arguments: center=False and scale=False

Dyxuki · June 4, 2021, 8:33pm

I just got another related question.
In case we do regular normalization, should we always normalize on the whole batch? or is it also ok to normalize mini-batch by mini-batch?

reinoudbosch · June 4, 2021, 11:42pm

Hi Dyxuki,

I guess you can turn a BatchNorm layer into a standard normalization operation, but you can also use the latter directly.

I do not see why you would want to apply gamma and beta to the inputs, as these will only distort your input values. Gamma and beta serve to correct distortions to activations that are due to learning the parameters that determine the values of the activations. This is not applicable to the inputs themselves - as you want to simply normalize them, you do not want to learn parameters that determine them.

reinoudbosch · June 5, 2021, 12:03am

Hi Dyuxi,

I take it you are referring to the input layer here. If you would normalize mini-batches of inputs to the model you would distort your input values due to differences in means and standard errors of parts of the input data. I would expect doing this during training will lower the accuracy of your model during testing.

Dyxuki · June 5, 2021, 7:52am

Actually I was considering BatchNorm layer because it’s a built-in function in tensorflow keras that I can directly apply.

But yea I just realized that from tf v2.1 or higher, there a layer in tf.keras.layers.experimental.preprocessing.Normalization that does the regular normalization operation, so I could that instead.

Also coming to the second point, I believe that if I add a BatchNorm layer just after the input layer in the model, while training BN will be applied on mini-batches right? So based on your answer it is not optimal neither.

reinoudbosch · June 5, 2021, 3:01pm

Hi Dyxuki,

Applying the Normalization layer makes sense to me. Great to see you found a solution using built-in functions.

Dyxuki · June 6, 2021, 8:17am

thanks for your answers

Topic		Replies	Views
Questions on batch normalization Improving Deep Neural Networks: Hyperparameter tun	3	364	September 27, 2023
Question about batch norm Improving Deep Neural Networks: Hyperparameter tun	6	560	April 26, 2023
Batch Normalization Questions Improving Deep Neural Networks: Hyperparameter tun	2	400	September 15, 2023
Batch Normalization vs Feature Input Normalization Improving Deep Neural Networks: Hyperparameter tun	3	622	May 24, 2021
Batch Norm and Covarient shift Improving Deep Neural Networks: Hyperparameter tun week-3	4	18	September 14, 2024

Understanding BatchNormalization

Related topics