C2_W3_Lab_1_transfer_learning Question

In this jupyter notebook we are taking the inception_v3 model, removing its final layers and replacing them with our own DNN. What confuses me is how we are doing this exactly.

see: https://github.com/https-deeplearning-ai/tensorflow-1-public/blob/main/C2/W3/ungraded_lab/C2_W3_Lab_1_transfer_learning.ipynb

What I’m reading in this code looks redundant to me. First it appears that tensorflow has the inception model already available as tensorflow.keras.applications.inception_v3, however then we install it using wget and install a specific version that includes notop in the filename? If we are downloading it, why are we also importing a library at the top of the code?

from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras import Model

from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras import layers

# Download the pre-trained weights. No top means it excludes the fully connected layer it uses for classification.
!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5

# Set the weights file you downloaded into a variable
local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

Now we are using the imported InceptionV3 method, and then loading the weights into it? There’s no explanation for why this occurs in the notebook. Does the notop.h5 file provide weights and the InceptionV3 method provide something else? Why couldn’t we provide these weights directly to a tensorflow.keras.Model method?

# Initialize the base model.
# Set the input shape and remove the dense layers.
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
                                include_top = False,
                                weights = None)

# Load the pre-trained weights you downloaded.
pre_trained_model.load_weights(local_weights_file)

# Freeze the weights of the layers.
for layer in pre_trained_model.layers:
  layer.trainable = False

Here we now select a specific layer of the model, but it’s unclear if this returns all prior layers of the model by doing this. I imagine we are just selecting a given layer, and part of the layer object metadata is information about the prior layer that connects with it, but that’s not clear.

# Choose `mixed7` as the last layer of your base model
last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output

It now looks like we take the output from this selected layer and then provide it as an input to the layers of our new DNN. We then create a new model with the pre_trained_model.input object, but here it’s not clear if the layers that existed after the 'mixed7'layer in the original model are also included with the pre_trained_model.input object.

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)

# Append the dense network to the base model
model = Model(pre_trained_model.input, x)

# Print the model summary. See your dense network connected at the end.
model.summary()

Our new model seems to include the added DNN layer but excludes any layers that existed after 'mixed7'layer in the original model. I’m just not clear on how those layers are being excluded.

 mixed7 (Concatenate)           (None, 7, 7, 768)    0           ['activation_154[0][0]',         
                                                                  'activation_157[0][0]',         
                                                                  'activation_162[0][0]',         
                                                                  'activation_163[0][0]']         
                                                                                                  
 flatten (Flatten)              (None, 37632)        0           ['mixed7[0][0]']                 
                                                                                                  
 dense (Dense)                  (None, 1024)         38536192    ['flatten[0][0]']                
                                                                                                  
 dropout (Dropout)              (None, 1024)         0           ['dense[0][0]']                  
                                                                                                  
 dense_1 (Dense)                (None, 1)            1025        ['dropout[0][0]']                
                                                                                                  
==================================================================================================
Total params: 47,512,481
Trainable params: 38,537,217
Non-trainable params: 8,975,264

We are downloading weights from an external source. Layers are created as part of InceptionV3 method call and so there’s no need to download the architecture.

The downloaded file is of hdf5 format which can be accessed via h5py. You can provide the file path to the function call like this:
weights='/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5' instead of a call to load_weights.

pre_trained_model.layers is a property of the Model class which returns a list of layers. The Model class internally tracks the relationship between layers and so you don’t have to worry about it.

Read Functional API to understand what .input and .output mean.
Executing model.summary(expand_nested=True, show_trainable=True) should shed light on the final model architecture.

2 Likes