Question About Transfer Learning

michael_yoon · May 31, 2022, 2:51pm

In week 2 Course 4, Andrew Ng talks about transfer learning and how it is helpful when you have very little data, and he talked about an example using a pre-trained animal detection for detecting two cats, tiger and misty. And my question is, how close does the pre-trained model have to be to your data you want to train it on for example, if I had a very specific AI like an AI to detect a character in a game, how would I go about finding a suitable pre-trained model that i could use for transfer learning? I also understand that in transfer learning they are using low level features like lines and curves from the previous model, so would a somewhat different pre-trained AI work on a new training set.

Any insight would be greatly appreciated, and sorry if the question was abit confusing.

paulinpaloalto · May 31, 2022, 3:26pm

I should start with the disclaimer that I don’t have any actual practical experience with Transfer Learning: all I know is what I’ve heard Prof Ng say in these courses. With that in mind, my understanding is that if you’re trying to solve any kind of image recognition problem, the basics are pretty much the same, so the issue is more finding a system that takes input images in the format that your input data happens to be in or such that you can convert your inputs to the format that the system is trained on without losing too much information. But then there is a very specific and situational set of decisions you need to make about where in the pretrained model you have to remove and replace and/or retrain the layers. Roughly speaking you’ve got early layers that detect primitive features (edges, curves, color patterns), middle layers that detect more complex features (cat’s ears, car wheels, propellers) and then the last few layers that integrate the complex features into the final answer (it’s a truck or a tiger). So at that point, you have to make an intelligent decision about how similar your data is to the training corpus of the base system you are transferring from. Prof Ng does spend some time discussing how to approach those decisions in the lectures, so it might be worth listening to what he says again with the above in mind. I remember that he also says there are choices to be made about whether or not you freeze the earlier layers in the cases that you are using them “as is”. There may be something to be gained by starting with the trained model and then doing incremental training even on the early layers, but that is not always necessary.

balaji.ambresh · May 31, 2022, 3:27pm

Have you seen this video? For model selection how about:

Talking to a domain expert.
Trying / evaluating / selecting.

michael_yoon · May 31, 2022, 4:05pm

Thank you for the fast response, and I think I understand what you are trying to say. So basically as I understand it the pre-trained model needs to have the same input, like an image. You can also resize the image it to fit the pre-trained model.

But for how related a pre-trained model is, I think it doesn’t really matter how closely related it is, but what does matter is how many layers you need to “freeze”. And to my understanding you would “freeze” more layers if the pre-trained model was similar, because some of the high level features are already learned and you would want to keep the high level features due to the similarity in models. But if the pre-trained model was somewhat different then you would “freeze” less layers as you would want to learn new complex functions but keep the low level features.

Please correct me if anything I have said is wrong.

paulinpaloalto · May 31, 2022, 4:15pm

Yes, I think the way you described it sounds correct to me. The decisions that you need to make about how many layers it makes sense to freeze are not “cut and dried”, meaning that there is no one universal correct answer. As Prof Ng says in many cases around tuning solutions, you will need to do some experiments to figure out what works best for your particular problem. The other general point that I remember him making in the Transfer Learning lectures is that if you unfreeze earlier than you really need to or even do incremental training on all the layers, it probably will not give you worse results. It’s primarily a resource question: the more layers you train the more cpu/gpu time and wall clock time it costs you. If you retrain everything it may also not do you any incremental good in exchange for the resource cost. It depends on the situation.

michael_yoon · May 31, 2022, 4:30pm

I understand my original question now, and the trade-off between better accuracy - more unfrozen layer and between faster training time - less unfrozen layers.
So when i try to make my own AI i will try and experiment with how many layers to be frozen and to how a different pre-trained dataset will affect the loss and accuracy.

Thank you very much for your time

Topic		Replies	Views
Week 2, Transfer Learning Convolutional Neural Networks coursera-platform	1	532	January 14, 2022
Applying Transfer Learning to Cats v Dogs (Lab 1) Convolutional Neural Networks in TensorFlow week-module-3	1	560	March 21, 2022
Understanding Transfer learning Convolutional Neural Networks coursera-platform	4	565	August 29, 2022
When to use Transfer Learning Structuring Machine Learning Projects coursera-platform	4	572	January 18, 2022
Can I use a transfer learning model anywhere? AI Discussions	8	215	January 28, 2023

Question About Transfer Learning

Related topics