Week 2 Assignment 2 Why do we set base_model(training=False)?

In the second assignment of Week 2, we implemented the transfer learning from MobileNet. We import the MobileNet model by the following code

base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,
include_top=False, # <== Important!!!
weights=‘imagenet’) # From imageNet

Then we set

base_model.trainable = False

So we freeze all parameters in the base_model.

Why do we need to set base_model(x, training=False) in the next following code? The comment says that “set training to False to avoid keeping track of statistics in the batch norm layer”. I don’t understand the explanation. I set training=True and it also gives a decent result.

My guess is that setting training=False makes the base_model acting as an inference model. But we have already make base_model untrainable. Why do we need this extra step?

1 Like

HI @ken2022

I thinks that the MobileNet has a batch normalization layer(like the upper photo) it is depending layer other than dense layer it isn’t depending layer so writing base_model(x, training=False) prevent changing the statistics(previous calculation of batch normalization(standardization)) to change…because the batch normalization layer change according to number of training set that you have.so it is keep the batch normalization layer statistics like what it trained before(number of training that the model run according to it before)…it is different from base_model.trainable = False
that make layers freeze …
I think this like will five you a good intuition about it python - What does training = False actually do for Tensorflow Transfer Learning? - Stack Overflow


On a more general note related to Transfer Learning:

In transfer learning, we start with a pre-trained model, and then we may want to fine-tune the model on our own dataset. This can be done by unfreezing some of the layers of the pre-trained model or adding new layers to the pre-trained model, and training those layers while keeping the rest of the layers frozen.

Setting training = false when using a pre-trained model is typically done during the fine-tuning process. It indicates that we do not want to update the weights of the frozen layers, where the weights have already been trained.

By keeping the weights of the frozen layers fixed, we allow the model to learn from the new dataset using only the unfrozen layers. This allows us to take advantage of the knowledge and features learned by the pre-trained model and apply them to our new dataset.

1 Like

thank you for your reply. So when we set base_model.trainable=False, we only freeze trainable parameters. BatchNormalization will still record its moving average if we don’t set training=False for the BatchNormalization. Is my understanding right?

yes you are understanding right because batch normalization layer is depending layer like what I said before