Pre-trained model- What is parameter 'weights=None'

Hi,

I am not able to understand why the ‘weights = None’ for using the pre-trained model as base model.
In an example from the documentation it has been specified as below:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights=‘imagenet’)

Where as, in the exercise book -‘Apply transfer learning to Cats vs Dogs’, it is specified as weights=None.
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)

Seeking more clarity for this difference.
Thanks

2 Likes

When you specify weights=None, the weights learnt from imagenet is not used for initalizing the model i.e. only the architecture is used.

It’s possible to load weights using the load_weights method after the model has been created.
This could be done for a number of reasons like:

  1. When using coursera platform to run code, they might not want to download the weights for every student submission. Having a shared location for weights saves bandwidth.
  2. It’s possible that the author of this notebook took the base weights and tuned them for a few epochs and then saved it. When you make use of these new weights, there’s less training to be done.
2 Likes

Hello,

You can use 3 option:
1 - The weights will be initialized randomly if weights = None
2 - Pre-training on ImageNet if weights = ‘imagenet’
3 - Or the path to the weights file to be loaded
Default to ‘imagenet’
layer.trainable = False because
In a NN, parameters that don’t compute gradients are usually called frozen parameters.
It is useful to “freeze” part of your model if you know in advance that you
won’t need the gradients of those parameters (this offers some performance benefits by reducing autograd computations).This is known as finetuning.
In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels.
You can see in the notebook,only parameters of last_layer updated else set to layer.trainable = False.

1 Like

Thanks for the reply @balaji.ambresh and @bisht
Fine, now its clear why ‘weights=None’. We are using the architecture without the weights(no transfer learning) and when we specify weights=‘imagenet’ , we are using the weights trained on imagenet data.

Now in the exercise book, we have used
local_weights_file = ‘/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5’. So this could be weights from a custom trained model for the purpose of learning in Coursera platform as specified by @balaji.ambresh.

I have more queries in this regard.

In the exercise, -‘Apply transfer learning to Cats vs Dogs’, we are freezing the layers from the pre-trained model and finetuning the model.

We have mentioned ‘include_top= False’, which means we are not considering the fully connected dense layer in the pretrained model.

Now, in the exercise book we have fined-tuned this pretrained model with our data, by adding a Flatten layer, a Fully connected Dense layer and an output layer.

But, in some other examples , as also mentioned in the documentation, the finetuning was done by a adding a Global average pooling layer and an out layer. So why is a Fully connected dense layer not added here. What is the difference in these 2 approaches?

1 Like

When we transfer a model we have the following options:

  1. Pick weights.
  2. Pick the subset of layers.
  3. Tune the number of layers to train after deciding on the number of layers and the initial weights.
  4. Extend the base model.

Choice of model architecture is problem specific and requires a lot of experimentation (guided by metrics on train / validation datasets). There is no one correct answer for all problems.

Thanks @balaji.ambresh