[Week 2] [Transfer learning] Is fine-tuning really working?

While going through this assignment, I’m getting a pretty bad accuracy:

Below are the results for my alpaca/not-alpaca model after with only the last single-neuron layer trained on 5 epochs.

Epoch 1/5
9/9 [==============================] - 26s 3s/step - loss: 17.1979 - accuracy: 0.4885 - val_loss: 8.9154 - val_accuracy: 0.6154
Epoch 2/5
9/9 [==============================] - 26s 3s/step - loss: 7.5174 - accuracy: 0.4427 - val_loss: 4.9977 - val_accuracy: 0.6154
Epoch 3/5
9/9 [==============================] - 44s 5s/step - loss: 4.2315 - accuracy: 0.4847 - val_loss: 1.1966 - val_accuracy: 0.6154
Epoch 4/5
9/9 [==============================] - 38s 4s/step - loss: 1.6833 - accuracy: 0.4542 - val_loss: 1.1525 - val_accuracy: 0.6154
Epoch 5/5
9/9 [==============================] - 46s 5s/step - loss: 1.4461 - accuracy: 0.5382 - val_loss: 0.7279 - val_accuracy: 0.3846

After this, as per assignment recommendation, the layers >120 are un-frozen and the network is fine-tuned with a smaller learning rate for another 5 epochs:

Epoch 5/10
9/9 [==============================] - 36s 4s/step - loss: 1.2457 - accuracy: 0.5267 - val_loss: 0.7440 - val_accuracy: 0.6154
Epoch 6/10
9/9 [==============================] - 47s 5s/step - loss: 1.1700 - accuracy: 0.4847 - val_loss: 0.8437 - val_accuracy: 0.6154
Epoch 7/10
9/9 [==============================] - 23s 3s/step - loss: 1.1663 - accuracy: 0.4847 - val_loss: 0.7152 - val_accuracy: 0.6154
Epoch 8/10
9/9 [==============================] - 45s 5s/step - loss: 1.1959 - accuracy: 0.4847 - val_loss: 0.6826 - val_accuracy: 0.3846
Epoch 9/10
9/9 [==============================] - 23s 3s/step - loss: 1.1003 - accuracy: 0.4771 - val_loss: 0.6943 - val_accuracy: 0.3846
Epoch 10/10
9/9 [==============================] - 36s 4s/step - loss: 1.0945 - accuracy: 0.5038 - val_loss: 0.7129 - val_accuracy: 0.3846

The overall accuracy & loss are as below:
image

From what I see, due to fine tuning, the loss does improve a little bit. But the accuracy has actually become worse on both train set and validation set after fine-tuning.
And the model is hovering around ~50% accuracy for binary-classification which is as good as random (pretty bad).

The notebook doesn’t really mention a rough range expected, so I’m wondering if I did something wrong or this is in the expected ball-park. All tests are passing and I’ve followed all instructions/comments in the notebook.
Can someone who has already completed & submitted the assignment please share your results? Or mentors if you’ve reference set of results, they’ll be quite helpful here.

Same here i got 100/100 but my model stayed the same

Actually I realized that I had not frozen my model in the first phase after someone pointed it out on a different thread – Nbgrader error - Week 2 - Transfer Learning with MobileNet - #3 by Mubsi
Because my whole model was trainable by mistake not just the last classifier layer, I was training the whole model on this tiny dataset. And I’m guessing that because this dataset is relatively small and the learning rate is large enough, this would be detrimental to initial layers and mess up the pre-trained weights resulting in bad accuracy.

After fixing the code to freeze all layers except the last layer, I get below results for first 5 epochs which are much better:

Epoch 1/5
9/9 [==============================] - 9s 1s/step - loss: 0.6746 - accuracy: 0.6221 - val_loss: 0.3500 - val_accuracy: 0.7538
Epoch 2/5
9/9 [==============================] - 8s 848ms/step - loss: 0.3498 - accuracy: 0.8359 - val_loss: 0.0985 - val_accuracy: 0.9538
Epoch 3/5
9/9 [==============================] - 8s 845ms/step - loss: 0.3011 - accuracy: 0.8740 - val_loss: 0.1289 - val_accuracy: 0.9385
Epoch 4/5
9/9 [==============================] - 7s 832ms/step - loss: 0.2834 - accuracy: 0.8702 - val_loss: 0.1106 - val_accuracy: 0.9692
Epoch 5/5
9/9 [==============================] - 8s 845ms/step - loss: 0.3477 - accuracy: 0.8550 - val_loss: 0.0665 - val_accuracy: 0.9846

After this, fine-tuning for layers 120 onward (Edit: see next comment) with smaller learning rate yields decent results:

Epoch 5/10
9/9 [==============================] - 8s 924ms/step - loss: 0.2049 - accuracy: 0.9237 - val_loss: 0.0918 - val_accuracy: 0.9538
Epoch 6/10
9/9 [==============================] - 8s 836ms/step - loss: 0.1851 - accuracy: 0.9275 - val_loss: 0.0664 - val_accuracy: 0.9846
Epoch 7/10
9/9 [==============================] - 8s 846ms/step - loss: 0.1966 - accuracy: 0.9313 - val_loss: 0.0670 - val_accuracy: 0.9846
Epoch 8/10
9/9 [==============================] - 7s 802ms/step - loss: 0.1975 - accuracy: 0.9237 - val_loss: 0.0851 - val_accuracy: 0.9538
Epoch 9/10
9/9 [==============================] - 7s 802ms/step - loss: 0.1498 - accuracy: 0.9275 - val_loss: 0.0728 - val_accuracy: 0.9846
Epoch 10/10
9/9 [==============================] - 7s 802ms/step - loss: 0.1985 - accuracy: 0.9198 - val_loss: 0.0661 - val_accuracy: 0.9846

Here’s the accuracy & loss plotted:

Actually, while trying some more experiments, I realized that the way the notebook is structured, we are not really unfreezing the model layers in the fine tuning section.
In the fine-tuning section, we are making few layers of the base_model object trainable=True, but this base_model was defined in “3.1 - Inside a MobileNetV2 Convolutional Building Block”. This is not same as the model2 that we defined in “3.2 - Layer Freezing with the Functional API”. And then we just recompile model2 with a different learning rate. I double checked this using below code

for l in model2.layers:
    print("====>", l.name, l.trainable)
    if type(l) == type(model2):
        for sl in l.layers:
            print("====> ====>", sl.name, sl.trainable)

This means we just trained for 5 more epochs with a lower learning rate while keeping everything else same. That’s why, there’s only minor improvement in the model.

2 Likes

I modified the code to actually unfreeze the layers >120 by adding below line at the top of that cell

base_model = model2.layers[4]

I verified that indeed model2’s layers became trainable as per expectations, using the code snippet mentioned above. But after training this, my accuracy has gone for a toss and the model gets completely ruined due to this attempt at unfreezing the layers. Below are the results:

First 5 epochs with only last layer trainable, everything else frozen, and learning rate = 0.01. Our model has reached a reasonably good state.

Epoch 1/5
9/9 [==============================] - 8s 868ms/step - loss: 1.0218 - accuracy: 0.5611 - val_loss: 0.3271 - val_accuracy: 0.9077
Epoch 2/5
9/9 [==============================] - 7s 791ms/step - loss: 0.5122 - accuracy: 0.7786 - val_loss: 0.1627 - val_accuracy: 0.9231
Epoch 3/5
9/9 [==============================] - 7s 811ms/step - loss: 0.4709 - accuracy: 0.8321 - val_loss: 0.3071 - val_accuracy: 0.8154
Epoch 4/5
9/9 [==============================] - 7s 789ms/step - loss: 0.3922 - accuracy: 0.8321 - val_loss: 0.1426 - val_accuracy: 0.9385
Epoch 5/5
9/9 [==============================] - 7s 789ms/step - loss: 0.2721 - accuracy: 0.8855 - val_loss: 0.0935 - val_accuracy: 0.9846

Next 5 epochs with layers < 120 frozen, everything else trainable, and learning rate = 0.001. The model is basically ruined!

Epoch 5/10
9/9 [==============================] - 9s 1s/step - loss: 3.3977 - accuracy: 0.5229 - val_loss: 0.6638 - val_accuracy: 0.3846
Epoch 6/10
9/9 [==============================] - 9s 946ms/step - loss: 0.7377 - accuracy: 0.5076 - val_loss: 0.7065 - val_accuracy: 0.6154
Epoch 7/10
9/9 [==============================] - 8s 933ms/step - loss: 0.7601 - accuracy: 0.5534 - val_loss: 0.6689 - val_accuracy: 0.6154
Epoch 8/10
9/9 [==============================] - 9s 968ms/step - loss: 0.7639 - accuracy: 0.4580 - val_loss: 0.6804 - val_accuracy: 0.3846
Epoch 9/10
9/9 [==============================] - 9s 1s/step - loss: 0.7150 - accuracy: 0.4885 - val_loss: 0.6772 - val_accuracy: 0.3846
Epoch 10/10
9/9 [==============================] - 8s 934ms/step - loss: 0.7832 - accuracy: 0.4733 - val_loss: 1.0239 - val_accuracy: 0.6154

Here’s the overall plot of accuracy & loss:
image

@paulinpaloalto can you help with the last 2 comments?
Am I doing something wrong due to which my model becomes worse with fine-tuning rather than improving?

I suspected that even with 0.1 factor, the learning rate was too high for the unfrozen layers.
After a few more experiments, I found that a lr=0.003*base_learning_rate and fine_tune_at=117 gave me pretty good results as below:
image

Not sure if there is any theoretical explanation / reasoning on why the learning rate has to be reduced by such a significant factor when layers are unfrozen vs. when we’re only training the last classifier layer.

How to achieve this part?

 # Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel

Hi,

I am trying to understand this line of code in the Notebook
base_model = model2.layers[4]
What does it mean the [4] here? And why after doing this, we use the model2 and not the base model? We are only fine tuning by doing more epochs and lowering the learning rate? Or how is the relation here between the base model and the model2?

Thank you very much

Hi, it’s selecting the 4th layer of the model2, which is the MobileNet layer which itself contains more than a hundred layers, and we will need to freeze everything before layer 120 by setting the trainable parameter to false.

Hi… thanks for your answer. I understood also that: it’s selecting the 4th layer of the model2. But not sure how/why/where it is used, since after that, we compile is model2 and not base model. Should not we compile then base model and not model2?

If I’m not wrong, there is a mistake in Exercise 3. When using model2 here, model2 uses the previous base model where all layers are frozen (Exercise 2). However, Exercise 3 unfreezes the layers from 120 to 155 in layer 4, which is assigned to a new base model. The second base model is a global variable, the previous base model is a local variable as it was defined in the function. So, there are 2 different base models. model2 still reads from the local base model. Fitting that occurs right after the global base model doesn’t use the modifications made with the global base model. Still, the accuracy goes up simply because the number of epochs is increased to 10, and the same old model2 continues training from epoch 5. I don’t know, maybe I’m missing something here.