Week 2- Transfer learning

Prof. Ng gave the example of using weights of pre trained model on cat images to detect the pet cat of a guy is whether Tigger or Misty or Neither.
I want to ask that is the scope of the pre trained weights in the above example is only restricted to give our model an idea of cat(detecting the outlines of cat or et al) so that it further build upon it using the images of misty and tigger to detect if its them or not or is it working in some other way?

I’m not sure I completely understand the question, but I’ll give it a try and you can let me know if I’m not getting your point:

The assumption is that you are starting with a pretrained model that can recognize whether there is a cat in an image or not. So it has already learned the basics of how to spot a cat: the shapes of the various features like ears, face, eyes, tails and so forth and the different fur colors and all. Now you need to do some additional training to get it to distinguish between two specific cats. So the most likely approach to succeed would be to “freeze” the weights of the earlier layers in your pretrained network and then maybe replace the existing output layer with a couple more layers and then train the last few existing layers of the net and your newly added layers on your Misty and Tigger dataset.

In other words, the early layers of the network already know the basics and should be fine as is, but then you need to get the specialized training to make the model recognize the specific cats that you want it to.

okay I got that point. I got one more doubt, after freezing certain layers when our data is given to make network learn our task, to what layer our data is inputted?

It is input data, right? There’s only one way to input it: to the first hidden layer. But TF/Keras provide mechanisms for freezing the coefficients at individual layers. Have you been through the MobilNet exercise yet?

oh okay. No I’am yet to do mobilenet exercise. Actually I am stuck in first assignment

Plz take a quick look before deleting it and help me sort out the error, I did try for all the solution

{moderator edit - solution code removed}

Why is your input to the “shortcut” layer X? You are missing the point of the shortcut layer. The instructions don’t say very much, but look at the diagram of how the shortcut layer is intended to work.

okay that was the error!
thank you sir I am done with my assignment.

Sir after finishing the ResNet programming assignment I got a few doubts
1)How the functional api differentiates between all the variables when only X is used as variable name in every case?
2)What are exactly parameters in convolutional network? Are the values of every matrix simply be considered as a parameter or is there any criteria for a certain value of a matrix to be a parameter?(by matrix I mean matrix which are results of convolution operations)
3)How there are 50 layers in that model? I was able to count upto only 18 layers.

Maybe not complete answers, but here are some thoughts:

  1. The whole point of the Functional API is that the inputs are explicit, right? So they are whatever you say they are. X in some cases, X_shortcut in others. It’s whatever you need to achieve the goal.
  2. The parameters are the values you can learn. So all the coefficients in all the filters and the bias values for the conv layers. For a pooling layer, there are no parameters (nothing to learn). For fully connected layers, it is the W and b values.
  3. Look at the model summary. They print it out for you after the resnet50 cell.

2nd point, arent those simply the outcomes of convolution operation? like there is no learning involved isnt it?

You are missing my point: no the filters are not the outputs of the convolution. They are what determine the outputs, right?

its just that the filters are relatively less in no. and dimension as well so I am just wondering how come the no. of parameters can reach that level

As always, you have to “do the math”, right? The dimension of the filters at a given layer is:

f x f x nC_{in} x nC_{out}

Of course it is not required that the height and width be the same, but suppose f = 5, nC_{in} = 3 and nC_{out} = 8, then the number of parameters in that layer is:

5 * 5 * 3 * 8 + 8 = 608

That last +8 there is for the bias values (one scalar value per output filter). At that layer, each one of those 608 parameters is learnable through back propagation.

1 Like

That is a style that I also found very confusing. Turns out the answer lies in the graph model that TensorFlow is building from the information provided through the Functional API; it doesn’t actually use X at all in the graph. Here’s what I mean.

This is a very simple model defined using the Functional API, followed by its graph. Notice that what is showing up in the printout are the string names I provided in the “name=Abc” parameters; not the variable name on the left hand side of the assignment.

        #define an input layer
input_layer = keras.Input(shape=(784,), name="input_layer")

    #define a simple architecture using the functional API
dense_layer_object1 = layers.Dense(64, activation="relu", name="dense_layer_1")
l1 = dense_layer_object1(input_layer)

dense_layer_object2 = layers.Dense(64, activation="relu", name="dense_layer_2")
l2 = dense_layer_object2(l1)

dense_layer_object3 = layers.Dense(10, activation="softmax", name="predictions_functional_def")
predictions = dense_layer_object3(l2)

functional_model_def = keras.Model(inputs=input_layer, outputs=predictions)

keras.utils.plot_model(functional_model_def, "visualize_functional_model.png", show_shapes=True)

Now here is the exact same model using different, less human readable variable names. Notice that the printout of the model is exactly the same as above…no X1 or X2

inputs = keras.Input(shape=(784,), name="input_layer")

X1 = layers.Dense(64, activation="relu", name="dense_layer_1")
X2 = X1(inputs)

X1 = layers.Dense(64, activation="relu", name="dense_layer_2")
X2 = X1(X2)

X1 = layers.Dense(10, activation="softmax", name="predictions_functional_def")
X2 = X1(X2)

functional_model_def = keras.Model(inputs=inputs, outputs=X2)

keras.utils.plot_model(functional_model_def, "visualize_functional_model.png", show_shapes=True)

And finally the most terse expression of all, which combines what was written as 2 expressions for each layer into 1 expression per layer, and calls everything just X

inputs = keras.Input(shape=(784,), name="input_layer")

X = layers.Dense(64, activation="relu", name="dense_layer_1")(inputs)
X = layers.Dense(64, activation="relu", name="dense_layer_2")(X)
X = layers.Dense(10, activation="softmax", name="predictions_functional_def")(X)

functional_model_def = keras.Model(inputs=inputs, outputs=X)

keras.utils.plot_model(functional_model_def, "visualize_functional_model.png", show_shapes=True)

which produces

All three blocks of code produce the same model, with the same metainformation. The human readable variable name is only used in wiring together the layer relationships. I don’t use the all X style anymore even when writing using the 3^{rd} style of condensing the construction of a new layer into a single line of Python.

inputs = keras.Input(shape=(784,), name="input_layer")

l1 = layers.Dense(64, activation="relu", name="dense_layer_1")(inputs)
l2 = layers.Dense(64, activation="relu", name="dense_layer_2")(l1)
outputs = layers.Dense(10, activation="softmax", name="predictions_functional_def")(l2)

functional_model_def = keras.Model(inputs=inputs, outputs=outputs)

I find it more readable, and thus more testable and maintainable, to use explicit names. For example, if you make a cut and paste error, it stands out. But it seems to be common style to just use X everywhere. Hope this helps.