C4 general question about deciding which layers to retrain

Hi there.

When we fine-tune the Alpaca_model on week 2, assignment 2, Ex_3, we are told to “fine_tune_at = 120”. This is on the basis that the base model has 155 layers, which we can find by the line of code
“len(base_model.layers)”.

In a general case where we wanted to decide which layer to start the fine-tuning from, can you offer any guidance on the way to examine the layers and then determine which ones to freeze and which ones to retrain. The code in Ex_3 shows how to freeze or retrain, but no intuition is given on how “layer 120” was arrived at. It would be very useful to understand a bit more about this and what kind of code we might employ to that end.

Many thanks in advance.

I don’t think there is any useful guidance on this besides “experimentation”.

It’s absolutely a legitimate and important question, but my take on this is that the reason they didn’t offer any more concrete guidelines is that there really aren’t any. It all depends on the particulars of your situation. In the limit, you can always start with the pretrained model, make whatever changes you need to the last few layers (e.g. at the very least the output layer to account for your new classes) and then retrain all the layers of the model with your training data. Of course that may be somewhat of a wasted effort, but it should in principle do no harm other than maybe wasting some cpu time. But that also depends on how much training data you have that is specific to your case. If your dataset is relatively small, maybe there is some potential theoretical “harm” here in that you could be “dumbing down” or losing resolution in the earlier layers of the network.

With the caveat that I’ve never really tried this and am only basing my thoughts here on what I heard Prof Ng say in the lectures and general reasoning, I think the answer is that you just have to try some experiments and see what works. You can reason about how different your data is than the base training data on which the original model was trained. In the particular case here, the only real difference is that we have a new animal with a new label, but the fundamental types of images are pretty similar to the original dataset: RGB pictures outside in natural light with animals in them. So you definitely don’t need to retrain all the layers: the early layers are just learning basic shapes. Then the only question is how far back from the output layer would it help to go with the incremental training. I think the only rule of thumb is that you have to try it and see what works. You could actually do that type of experiment with the notebook here: try moving the “freeze point” earlier or later and see if you can see any difference in the results.

But note the important caveat here: I don’t have any actual practical experience trying to do Transfer Learning. So the above is probably worth about what you paid for it! :laughing:

Thank you Paul for your detailed answer.

I do have a real-world use-case and have been working with the nVidia open-source code. Transfer Learning (replacing the final fully-connected / dense layer) did not produce great results, so I guess my datasets differ more from the ResNet models that I tried than alpacas/cats/dogs etc.

I will try retraining more layers by increments and see how far back I can go while still improving the accuracy of the results.

Thank you Tom. I’ll experiment.

If you get really terrible results, the one other key thing to check is to make sure you understand how the original training set was normalized and that you have prepared your specific images in the same way. As a concrete example of this issue, there’s a pretty hilarious bug in the MobilNet notebook: they feed it raw images of alpacas, so they get really bizarre results like “television” and “window screen” from the pretrained model without any specialization for alpacas. You’d think an alpaca would look more like a camel or a llama than a television. But it turns out (as they clearly tell you later in the notebook), the original model was trained on normalized images. They even write the code for you to do the normalization: they just forgot to apply it when they ran the first test case. If you add that logic, you’ll find that it acts more like you’d expect: the original pretrained model classifies the alpaca images as another similar animal like camels and llamas for the most part.

BTW a bug is filed about the above schlamassel.

Please let us know what you learn from your experiments with retraining starting at various points in your particular case. It would be really enlightening to see actual real world results!

Thanks again Paul.

I’ll get back when I have something to share.