Why specify twice that the base model is not trainable?

Meir · April 23, 2021, 12:40pm

For transfer learning, we specify:

base_model.trainable = False

Why do we also need to pass training=False when calling the base_model?

yanivh · April 23, 2021, 1:42pm

@Meir, can you be more specific as to where you saw this setting to base_mode.trainable? Was it an assignment?

In transfer learning you want to use an existing trained model, for tasks that were not necessarily close to those the model was trained for. You want to use the same weights the model already converged to for predicting what interests you. But, you would like to adjust some of the model weights, since the predicting task is not quite the same. Or, you would maybe like to add layers to the model and train only those with your data.
The way to do this in TensorFlow is to “freeze” model weights by setting them to “untrainable”. You can do this for the entire model, or for parts of it. More on this on TensorFlow great documentation: Transfer learning and fine-tuning | TensorFlow Core.

thiyanesh · April 23, 2021, 2:42pm

@Meir,
The pre-trained model might have been be trained for more general/broader use case. Further these pre-trained models have lot more layers and the learning process(back-propagation) will take lot of time and resource as it has to learn many parameters(weights and biases).

To solve a specific use case at hand, transfer learning helps us to augment the existing model’s knowledge with our task specific learnable parameters by reusing the knowledge of most of the learned parameters in the existing model.

The primary way to achieve this is to use most of the existing layers(weights, biases, activations) as per the base model and append additional layers(primarily output layer) as per the current task. In some cases, few of the layers in the existing model also can be relearned (during back propagation).

When used this way, the knowledge(trained parameters and activations) of the existing model can be preserved using base_model.trainable = False in tensorflow(keras). This ensures that, the new training will not modify these existing parameters during back-propagation. Only the newly added or configured weights and biases will be updated during the back propagation. This reduces the training time as only few parameters in the mostly deeper layers closer to the output are relearned.

Further while freezing the pre-trained model, there are 2 options

freeze the complete existing model by setting base_model.trainable = False
freeze the specific layers of the existing model using

for layer in model.layers:
    layer.trainable = False

Only certain specific layers can be freezed using the layer.trainable attribute.

Or Did you mean about the using trainable and compile? If so, the discussion in this thread can help: Incorrect number of Total Parameters when Loading a Saved Model with Trainable = False · Issue #37531 · tensorflow/tensorflow · GitHub

Note: In addition to the excellent course by deeplearning.ai team, there are few other references that can solidify these understandings. Assuming its not a violation of policy, please find the following resources which will be useful for anyone learning ML/DL:

3Blue1Brown:
1.1 NN: Neural networks - YouTube
1.2: Linear Algebra: Vectors | Essence of linear algebra, chapter 1 - YouTube
1.3: Differential calculus: The Essence of Calculus, Chapter 1 - YouTube
DeepLizard: https://youtu.be/3ou0KYtDlOI?list=PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL&t=499

Moderators, the intent for sharing these links are only for education purpose and i don’t have any financial or other gains with the above shared channels. If you find it as violation, please remove these references.

Meir · April 24, 2021, 5:46pm

I refer to the given code in the second assignment of Week 2 of Course 4.

AntonioMaher · July 31, 2022, 1:01pm

Dear all,

I reiterate Meir’s question, since none of the answers answered it.

Re Course 4, W2, 2nd assignment.

Why need these 2 statements to specify the weights should not be altered by further training:

base_model.trainable = None
x = base_model(none, training = None)

Thank you.

AntonioMaher · July 31, 2022, 1:13pm

Dear all,

I think I found the answer myself in this link:

https://keras.io/getting_started/faq/#whats-the-difference-between-the-training-argument-in-call-and-the-trainable-attribute

anon57530071 · July 31, 2022, 5:34pm

Good point.

Dropout, BatchNormalization,… many layers change behavior by this “training” flag.

And, one potential problem in the assignment C4W2A2 is, data augmentation functions are also “training” flag dependent. It transforms the image in the training phase, and does not in the prediction (inference) phase. And, if training flag is not explicitly set, then, Keras determines the mode with checking some environment. In the worst case, it works unexpectedly.

In C4W2A2, there is one cell to pick 9 images to see data augmentations. If you see identical images, then, set “training=True”. Then, data augmentation should work. (and vise versa)

Just my two cents…

AntonioMaher · July 31, 2022, 7:03pm

Thank you Nobu, for your always valuable contribution!

Topic		Replies	Views
Week 2 Assignment 2 Why do we set base_model(training=False)? Convolutional Neural Networks coursera-platform	4	742	December 20, 2022
Pre-trained model- What is parameter 'weights=None' Convolutional Neural Networks in TensorFlow week-module-3	6	656	May 6, 2022
Why in exercise 3, we set base_model.trainable =True instead of model2.trainable=true? (Transfer Learning with MobileNet) Convolutional Neural Networks coursera-platform	1	558	May 29, 2021
Understanding Transfer learning Convolutional Neural Networks coursera-platform	4	560	August 29, 2022
Questions Week2 Assignment2 "Transfer Learning" Convolutional Neural Networks coursera-platform	3	525	February 14, 2023

Why specify twice that the base model is not trainable?

Related topics