Hello, Prof Andrew Ng said that “unless you have large dataset and large computational margin,” we have to train the network from scratch. My question is on what data range is the transfer learning only applicable? Thank you!
Hi Michelle,
Thanks for bringing this interesting question.
I do not have an exact answer for that. My understanding of Transfer Learning is this one:
Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch.
Thus instead of looking for a range or a number that might be different depending on the data type and each application, I would also find a motivation regarding performance and re-using computational resources. Of course the baseline model that you want to use as a reference needs to be trained on a big dataset, but how big does it need to be? You need to study each use case in a separate way, for instance thousands of pictures might be enough in one specific classifier type but not in another case.
Please do read this article about the topic:
Best,
Rosa
Another reason for transfer learning is that there exist highly developed and good performing models, trained by experts on massive data sets, but maybe they output the wrong shape. Say, it was trained on ImageNet and handles 1,000 classes, but you only want to classify images as dog or cat. You can take an existing model, ResNet or MobileNet, and modify only the layer producing the output shape. Vastly reduces the number of parameters to train. I think the idea is ‘Google trained MobileNet for us, let’s try to use as much as we can.’