Questions Week2 Assignment2 "Transfer Learning"

ggggjin99 · February 13, 2023, 2:21pm

Q1. What “aviod keeping track of statistics” means actually?
I think it as when we use BN we calculate the means and variances of each mini-batch and these values will be stored in memory, so explicitly writing “x = base_model(x, training=False)” means we are not gonna save those values!
Is it right answer?

Q2. When we make the fine tuning model with previous “model2” which we freezed parameters, we unfreeze some later layers. Then the default setting in previous model I think is

“base_model.trainable = False”, “x = base_model(x, training=False).”

And we unfreeze later layers in fine tuning model

“layer.trainable=True”,

then what about another? (x=base_model(x, training=True?)
I mean that when we train the fine tunning model,
we don’t care about the statistics in activations?
Or it just set the activations in later unfreezed layers with training=True?
I think we have to store it in memory, because if we update the batch norm parameters in later layer, the backpropagation of BN needs the “cache” of the computation which is done in forward pass. (of course including “statistics”)

(The activation I said here is (m, d) shape that “m” means the number of training examples in batch which is exactly (mhw) in CNN and “d” means “C” (channel))

balaji.ambresh · February 13, 2023, 5:33pm

When a layer is not trainable, its internal state doesn’t change. The pretrained weights will remain unchanged during training process.
Only layers whose trainable flag is set to True will update their weights during the training process. Activations like relu don’t have any statistics.

paulinpaloalto · February 13, 2023, 5:48pm

The other thing that you need to be careful of here is that there are two different kinds of trainable parameters:

The normal weight and bias values.
The mean and variance parameters for Batch Normalization.

The model.trainable flag controls whether you are doing back prop and updates on the weight and bias values.

The training flag is independent and controls whether BatchNorm updates its mean and variance values or uses stored values that were previously computed. Those are the “statistics” that the comment is talking about. Note that even when you are not training the network and only using it in “inference mode” to make predictions, you still have the option to enable the more dynamic behavior of BatchNorm. But doing that can cause some slightly odd results: the predictions you get on a given sample may vary a bit depending on what other samples you are bundling with it in a given inference call.

The general TF/Keras docs are not very clear on the true meaning of the various flags discussed above, but here’s a more detailed post from F. Chollet (the creator of Keras), which is basically a chapter of his book on Keras and discusses all this in a much clearer form. But be warned that it’s not a 2 minute read.

ggggjin99 · February 14, 2023, 3:17am

Thank you!
Now I see that the difference of model.trainable flag and inference / train mode flag.
And batch norm is tied with those two concept!

Topic		Replies	Views
C2_W2_Transfer_Learning Convolutional Neural Networks	2	550	December 13, 2021
Assignment-2 week-2, batch normalization layer Convolutional Neural Networks	3	567	December 25, 2021
What is meant by "When freezing layers avoid keeping track of statistics (like in the batch normalization layer)"? Convolutional Neural Networks	4	806	April 23, 2023
Week 2 Assignment 2 Why do we set base_model(training=False)? Convolutional Neural Networks	4	736	December 20, 2022
Course 4 Week 2: Programming Assignment ALPACA Convolutional Neural Networks	1	536	August 29, 2022

Questions Week2 Assignment2 "Transfer Learning"

Related topics