Why specify twice that the base model is not trainable?

For transfer learning, we specify:

base_model.trainable = False

Why do we also need to pass training=False when calling the base_model?

@Meir, can you be more specific as to where you saw this setting to base_mode.trainable? Was it an assignment?

In transfer learning you want to use an existing trained model, for tasks that were not necessarily close to those the model was trained for. You want to use the same weights the model already converged to for predicting what interests you. But, you would like to adjust some of the model weights, since the predicting task is not quite the same. Or, you would maybe like to add layers to the model and train only those with your data.
The way to do this in TensorFlow is to “freeze” model weights by setting them to “untrainable”. You can do this for the entire model, or for parts of it. More on this on TensorFlow great documentation: Transfer learning and fine-tuning  |  TensorFlow Core.

The pre-trained model might have been be trained for more general/broader use case. Further these pre-trained models have lot more layers and the learning process(back-propagation) will take lot of time and resource as it has to learn many parameters(weights and biases).

To solve a specific use case at hand, transfer learning helps us to augment the existing model’s knowledge with our task specific learnable parameters by reusing the knowledge of most of the learned parameters in the existing model.

The primary way to achieve this is to use most of the existing layers(weights, biases, activations) as per the base model and append additional layers(primarily output layer) as per the current task. In some cases, few of the layers in the existing model also can be relearned (during back propagation).

When used this way, the knowledge(trained parameters and activations) of the existing model can be preserved using base_model.trainable = False in tensorflow(keras). This ensures that, the new training will not modify these existing parameters during back-propagation. Only the newly added or configured weights and biases will be updated during the back propagation. This reduces the training time as only few parameters in the mostly deeper layers closer to the output are relearned.

Further while freezing the pre-trained model, there are 2 options

  1. freeze the complete existing model by setting base_model.trainable = False
  2. freeze the specific layers of the existing model using
for layer in model.layers:
    layer.trainable = False
  1. Only certain specific layers can be freezed using the layer.trainable attribute.

Or Did you mean about the using trainable and compile? If so, the discussion in this thread can help: Incorrect number of Total Parameters when Loading a Saved Model with Trainable = False · Issue #37531 · tensorflow/tensorflow · GitHub

Note: In addition to the excellent course by deeplearning.ai team, there are few other references that can solidify these understandings. Assuming its not a violation of policy, please find the following resources which will be useful for anyone learning ML/DL:

  1. 3Blue1Brown:
    1.1 NN: Neural networks - YouTube
    1.2: Linear Algebra: Vectors | Essence of linear algebra, chapter 1 - YouTube
    1.3: Differential calculus: The Essence of Calculus, Chapter 1 - YouTube

  2. DeepLizard: https://youtu.be/3ou0KYtDlOI?list=PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL&t=499

Moderators, the intent for sharing these links are only for education purpose and i don’t have any financial or other gains with the above shared channels. If you find it as violation, please remove these references.

I refer to the given code in the second assignment of Week 2 of Course 4.

Dear all,

I reiterate Meir’s question, since none of the answers answered it.

Re Course 4, W2, 2nd assignment.

Why need these 2 statements to specify the weights should not be altered by further training:

  1. base_model.trainable = None
  2. x = base_model(none, training = None)

Thank you.

Dear all,

I think I found the answer myself in this link:


Good point.

Dropout, BatchNormalization,… many layers change behavior by this “training” flag.

And, one potential problem in the assignment C4W2A2 is, data augmentation functions are also “training” flag dependent. It transforms the image in the training phase, and does not in the prediction (inference) phase. And, if training flag is not explicitly set, then, Keras determines the mode with checking some environment. In the worst case, it works unexpectedly.

In C4W2A2, there is one cell to pick 9 images to see data augmentations. If you see identical images, then, set “training=True”. Then, data augmentation should work. (and vise versa)

Just my two cents…

Thank you Nobu, for your always valuable contribution!