Question About Transfer Learning

In week 2 Course 4, Andrew Ng talks about transfer learning and how it is helpful when you have very little data, and he talked about an example using a pre-trained animal detection for detecting two cats, tiger and misty. And my question is, how close does the pre-trained model have to be to your data you want to train it on for example, if I had a very specific AI like an AI to detect a character in a game, how would I go about finding a suitable pre-trained model that i could use for transfer learning? I also understand that in transfer learning they are using low level features like lines and curves from the previous model, so would a somewhat different pre-trained AI work on a new training set.

Any insight would be greatly appreciated, and sorry if the question was abit confusing.

I should start with the disclaimer that I don’t have any actual practical experience with Transfer Learning: all I know is what I’ve heard Prof Ng say in these courses. With that in mind, my understanding is that if you’re trying to solve any kind of image recognition problem, the basics are pretty much the same, so the issue is more finding a system that takes input images in the format that your input data happens to be in or such that you can convert your inputs to the format that the system is trained on without losing too much information. But then there is a very specific and situational set of decisions you need to make about where in the pretrained model you have to remove and replace and/or retrain the layers. Roughly speaking you’ve got early layers that detect primitive features (edges, curves, color patterns), middle layers that detect more complex features (cat’s ears, car wheels, propellers) and then the last few layers that integrate the complex features into the final answer (it’s a truck or a tiger). So at that point, you have to make an intelligent decision about how similar your data is to the training corpus of the base system you are transferring from. Prof Ng does spend some time discussing how to approach those decisions in the lectures, so it might be worth listening to what he says again with the above in mind. I remember that he also says there are choices to be made about whether or not you freeze the earlier layers in the cases that you are using them “as is”. There may be something to be gained by starting with the trained model and then doing incremental training even on the early layers, but that is not always necessary.

Have you seen this video? For model selection how about:

  1. Talking to a domain expert.
  2. Trying / evaluating / selecting.
1 Like

Thank you for the fast response, and I think I understand what you are trying to say. So basically as I understand it the pre-trained model needs to have the same input, like an image. You can also resize the image it to fit the pre-trained model.

But for how related a pre-trained model is, I think it doesn’t really matter how closely related it is, but what does matter is how many layers you need to “freeze”. And to my understanding you would “freeze” more layers if the pre-trained model was similar, because some of the high level features are already learned and you would want to keep the high level features due to the similarity in models. But if the pre-trained model was somewhat different then you would “freeze” less layers as you would want to learn new complex functions but keep the low level features.

Please correct me if anything I have said is wrong.

Yes, I think the way you described it sounds correct to me. The decisions that you need to make about how many layers it makes sense to freeze are not “cut and dried”, meaning that there is no one universal correct answer. As Prof Ng says in many cases around tuning solutions, you will need to do some experiments to figure out what works best for your particular problem. The other general point that I remember him making in the Transfer Learning lectures is that if you unfreeze earlier than you really need to or even do incremental training on all the layers, it probably will not give you worse results. It’s primarily a resource question: the more layers you train the more cpu/gpu time and wall clock time it costs you. If you retrain everything it may also not do you any incremental good in exchange for the resource cost. It depends on the situation.

I understand my original question now, and the trade-off between better accuracy - more unfrozen layer and between faster training time - less unfrozen layers.
So when i try to make my own AI i will try and experiment with how many layers to be frozen and to how a different pre-trained dataset will affect the loss and accuracy.

Thank you very much for your time :grinning: