Assignment-2 week-2, batch normalization layer

In the final What to remember of this assignment

  • When freezing layers, avoid keeping track of statistics (like in the batch normalization layer)

how does this layer track of statistics? or am i interpreting this statement wrong

I don’t have the code in front of me, but believe it is referring to what is assigned by the training parameter. From the doc…

  • training : Python boolean indicating whether the layer should behave in training mode or in inference mode.
    • training=True : The layer will normalize its inputs using the mean and variance of the current batch of inputs.
    • training=False : The layer will normalize its inputs using the mean and variance of its moving statistics, learned during training.

My intuition is that you’re using False here and not updating the statistics from the new inputs. Does this make sense?

1 Like

yes I got it now that as we have set it as False it won’t keep track of mean and variances for calculating moving avg for testing instead it will use the precomputed Image net mean and variance.

But basically, this is transfer learning and since the layers in the base model are frozen, intermediate layer mean and variances (which are required for batch normalization) are taken from ImageNet parameters of the frozen layers that were trained on ImageNet. hence they are different from should not be and won’t be adjusted according to our input of alpaca images. But if that is the case how does normalization occur without tuning these intermediate beta (mean) and gamma(variance) values?

does normalization affect go away or reduce in transfer learning if say different image dimensions are used?

You’re either training a given layer or you’re not, right? If you’re training that layer (not freezing it), then you want gradients computed and training = True if it’s a BatchNorm layer. But as you see here, there is a decision process that you need to go through where you decide how many of the earlier layers you are freezing and at what point you need to do additional training to adapt the pre-existing model to your new data. And of course any new layers you tacked on need to be trained.

Wait, what? If you change the dimensions, all bets are off. All these models (DNN or CNN) require input of a fixed size and type, right? If your data is of a different dimension or image type (BW vs RGB), then you need to preprocess your inputs into the form expected by the pre-trained model from which you are transferring.

1 Like