After this, as per assignment recommendation, the layers >120 are un-frozen and the network is fine-tuned with a smaller learning rate for another 5 epochs:
From what I see, due to fine tuning, the loss does improve a little bit. But the accuracy has actually become worse on both train set and validation set after fine-tuning.
And the model is hovering around ~50% accuracy for binary-classification which is as good as random (pretty bad).
The notebook doesn’t really mention a rough range expected, so I’m wondering if I did something wrong or this is in the expected ball-park. All tests are passing and I’ve followed all instructions/comments in the notebook.
Can someone who has already completed & submitted the assignment please share your results? Or mentors if you’ve reference set of results, they’ll be quite helpful here.
Actually I realized that I had not frozen my model in the first phase after someone pointed it out on a different thread – Nbgrader error - Week 2 - Transfer Learning with MobileNet - #3 by Mubsi
Because my whole model was trainable by mistake not just the last classifier layer, I was training the whole model on this tiny dataset. And I’m guessing that because this dataset is relatively small and the learning rate is large enough, this would be detrimental to initial layers and mess up the pre-trained weights resulting in bad accuracy.
After fixing the code to freeze all layers except the last layer, I get below results for first 5 epochs which are much better:
Actually, while trying some more experiments, I realized that the way the notebook is structured, we are not really unfreezing the model layers in the fine tuning section.
In the fine-tuning section, we are making few layers of the base_model object trainable=True, but this base_model was defined in “3.1 - Inside a MobileNetV2 Convolutional Building Block”. This is not same as the model2 that we defined in “3.2 - Layer Freezing with the Functional API”. And then we just recompile model2 with a different learning rate. I double checked this using below code
for l in model2.layers:
print("====>", l.name, l.trainable)
if type(l) == type(model2):
for sl in l.layers:
print("====> ====>", sl.name, sl.trainable)
This means we just trained for 5 more epochs with a lower learning rate while keeping everything else same. That’s why, there’s only minor improvement in the model.
I modified the code to actually unfreeze the layers >120 by adding below line at the top of that cell
base_model = model2.layers[4]
I verified that indeed model2’s layers became trainable as per expectations, using the code snippet mentioned above. But after training this, my accuracy has gone for a toss and the model gets completely ruined due to this attempt at unfreezing the layers. Below are the results:
First 5 epochs with only last layer trainable, everything else frozen, and learning rate = 0.01. Our model has reached a reasonably good state.
@paulinpaloalto can you help with the last 2 comments?
Am I doing something wrong due to which my model becomes worse with fine-tuning rather than improving?
I suspected that even with 0.1 factor, the learning rate was too high for the unfrozen layers.
After a few more experiments, I found that a lr=0.003*base_learning_rate and fine_tune_at=117 gave me pretty good results as below:
Not sure if there is any theoretical explanation / reasoning on why the learning rate has to be reduced by such a significant factor when layers are unfrozen vs. when we’re only training the last classifier layer.
I am trying to understand this line of code in the Notebook
base_model = model2.layers[4]
What does it mean the [4] here? And why after doing this, we use the model2 and not the base model? We are only fine tuning by doing more epochs and lowering the learning rate? Or how is the relation here between the base model and the model2?
Hi, it’s selecting the 4th layer of the model2, which is the MobileNet layer which itself contains more than a hundred layers, and we will need to freeze everything before layer 120 by setting the trainable parameter to false.
Hi… thanks for your answer. I understood also that: it’s selecting the 4th layer of the model2. But not sure how/why/where it is used, since after that, we compile is model2 and not base model. Should not we compile then base model and not model2?
If I’m not wrong, there is a mistake in Exercise 3. When using model2 here, model2 uses the previous base model where all layers are frozen (Exercise 2). However, Exercise 3 unfreezes the layers from 120 to 155 in layer 4, which is assigned to a new base model. The second base model is a global variable, the previous base model is a local variable as it was defined in the function. So, there are 2 different base models. model2 still reads from the local base model. Fitting that occurs right after the global base model doesn’t use the modifications made with the global base model. Still, the accuracy goes up simply because the number of epochs is increased to 10, and the same old model2 continues training from epoch 5. I don’t know, maybe I’m missing something here.