Transfer learning assignment week 2

Hi,

The assignment is really interesting and it really blows my mind. However there are 2 lines of code that I do not really know how it works.
As you can see in the image below, I just don’t know why we need to access the 5th elements in model2.layers instead of writing base_model = model2 then compile. Second, Is the variable fine_tune_at a hyperparameter that I need to find myself? If yes, any advice for me to choose that efficient number of layers to fine-tuning.

Thanks for spending time!!

Hey @QUY_LE_MINH,
We are glad that you are enjoying the assignments. The answer to your question lies hidden in the assignment only. Find out the below lines of code in your assignment:

for layer in summary(model2):
    print(layer)

When you would have run this code, you might have received some output along the following lines:

['InputLayer', [(None, 160, 160, 3)], 0]
['Sequential', (None, 160, 160, 3), 0]
['TFOpLambda', (None, 160, 160, 3), 0]
['TFOpLambda', (None, 160, 160, 3), 0]
['Functional', (None, 5, 5, 1280), 2257984]
['GlobalAveragePooling2D', (None, 1280), 0]
['Dropout', (None, 1280), 0, 0.2]
['Dense', (None, 1), 1281, 'linear']

Now, from this output, we can clearly see that the first layer is for the Inputs, the following 3 layers are for the Data Augmentation & Pre-processing. It is the 5th layer, which consists of a Functional Model, which is nothing but our base_model. And this is the sub-model (you can say just for our reference), in which we want to freeze the layers, hence the 2 lines of code. I hope this resolves your query.

Cheers,
Elemento

1 Like

Yes, that is a hyperparameter that you can tune. The tradeoff is efficiency versus results: you could retrain all layers on your specific dataset, but that will be more expensive and the question is whether it will give you better results in exchange for the greater cost in time and compute resources. The only way to find out is to try some experiments. Rather than exploring the entire space, it might be good to first try picking one point earlier than than the 120 value that they suggest and one point later and see if you can see any measurable differences in training cost versus results on the three points in the spectrum that you would then have. If you see that going one way or the other relative to 120 is better, then you can try some more fine grained experiments.

1 Like

Thanks a lot @paulinpaloalto Sir for completing the reply. I overlooked the second part of the question :grimacing:

Cheers,
Elemento