Hello,
in the previous courses Andrew mentioned the importance to have normalized input data, it could speed up significately learning speed.
My question is, does BatchNormalization layers in keras do such operation?
In which case why don’t we always add a BatchNormalization layer just after the input layer?
Can anyone please clarify?
Hi Dyxuki,
A BatchNomalization layer also includes gamma and beta, which should not be applied to the input data. See, e.g., this Q&A.
Hello reinoudbosch
thank you very much for the answer.
However I’m still not totally convinced by that.
I read the Q&A, the paper, and had a glance at some other articles.
It appears that the learnable parameters ‘gamma’ and ‘beta’ in BN are here to re-introduce some biases and variance, to maintain the expressive power of the network.
So in this sense, yes gamma and beta ain’t useful for the input layer.
But even if not necessary, I think that having those extra parameters wouldn’t do much harm neither (or am I totally wrong?). Because if that works for any layers besides the input layer, why wouldn’t it work for the input layer?
From a mathemetical point of view the input layer x, or a[0] wouldn’t it be very similar to a[1], a[2], etc. ?
also, based on the tensorflow BN layer doc, it seems that it is possible to not use gamma and beta. They can be disabled by setting the arguments: center=False and scale=False
I just got another related question.
In case we do regular normalization, should we always normalize on the whole batch? or is it also ok to normalize mini-batch by mini-batch?
Hi Dyxuki,
I guess you can turn a BatchNorm layer into a standard normalization operation, but you can also use the latter directly.
I do not see why you would want to apply gamma and beta to the inputs, as these will only distort your input values. Gamma and beta serve to correct distortions to activations that are due to learning the parameters that determine the values of the activations. This is not applicable to the inputs themselves - as you want to simply normalize them, you do not want to learn parameters that determine them.
Hi Dyuxi,
I take it you are referring to the input layer here. If you would normalize mini-batches of inputs to the model you would distort your input values due to differences in means and standard errors of parts of the input data. I would expect doing this during training will lower the accuracy of your model during testing.
Actually I was considering BatchNorm layer because it’s a built-in function in tensorflow keras that I can directly apply.
But yea I just realized that from tf v2.1 or higher, there a layer in tf.keras.layers.experimental.preprocessing.Normalization that does the regular normalization operation, so I could that instead.
Also coming to the second point, I believe that if I add a BatchNorm layer just after the input layer in the model, while training BN will be applied on mini-batches right? So based on your answer it is not optimal neither.
Hi Dyxuki,
Applying the Normalization layer makes sense to me. Great to see you found a solution using built-in functions.
thanks for your answers