In this week, Andrew provided some general hints about “When transfer learning makes sense”, summarized in the following ones:
- Task A and B have the same input x.
- You have a lot more data for Task A than Task B.
- Low level features from A could be helpful for learning B.
In the example used, he mentioned some image classifier, in which some medical image researcher have found out that the arquitecture of the classifier might serve well, to build some sort of medical image classifier. So far, (and theoretically) so good. However, when “Low level features from A could be helpful for learning B” is mentioned, does Andrew mean “having the same distribution”?
Say for instance the classifier is an “animal classifier” (i.e. a NN that based on some animal pic, it will tell me if its a cat, a dog, a horse, etc) that was trained using 10M images; say also that I have 10K images of x-ray medical images, of patients with / without osteoporosis, and I want to make a classifier that allows me to predict if a patient suffers from that illness, or not. Provided the resolution and size of the images is the same, we would have the first and second conditions checked, however the third would NOT be checked, since the distribution is not the same at all, and so my hypothesis would be true…
Is my assertion correct?
I dont exactly understand what you mean by same distribution, but I don’t think that is needed. For example, someone has trained a system to classify different breeds of cats. Then you want to classify dogs. The distributions are different, but the low level features are the same, so you would probably benefit from transfer learning.
@jonaslalin has it exactly correct. But since picture == 10^3*words …
here’s an example of some low-level features extracted by a CNN. Did those come from cat images? dogs? Don’t know and don’t need to know. The caveat is that the low level features extracted from animal images might not be very helpful training a CNN to predict crop yield from aerial photos of fields (for example).
Very cool! Sorry, but I can’t help myself: gotta pile O(10^2) more words onto your eloquent pictures.
Thanks for the concrete example demonstrating the point that Prof Ng mentions at a number of different places in the lectures that the “early” layers of a network learn very low level features like edges and curves and then the later layers integrate those low level features into detectors for higher level features. When applying Transfer Learning, another decision you need to make is at what point in the network you need to either do more training on your specific inputs or whether you need to truncate the existing network and supply new final layers that will allow it to do well on your specific problem. Prof Ng discusses these issues in the lectures.
One other caveat in the specific example of whether you could apply a pretrained animal classifier (or at least some of its early layers) to a medical image classification problem is that the types of the images may be different. The animal pictures are probably one of the standard image representations (RGB, CMYK …), but medical images are typically greyscale, aren’t they? It’s possible that difference would not end up being significant (shades of grey are colors after all), but generally speaking you need to keep in mind that Neural Networks are very very specific about their inputs. They all need to have the same number of pixels and be of the same type. You could convert your medical images to an RGB representation and see what happens. Image libraries typically provide all sorts of transformations that can be used as “preprocessing” to get your images into the form expected by the pretrained network in question.
Hello @jonaslalin ,
Thanks a lot! The explanation was crystal clear. I even saw that “adding the cherry on top of the cake”, @ai_curious shared one example of “low-level features” (thank you too!), but the first explanation was good enough to have everything solved. A bit afterwards @paulinpaloalto , even added another scenario, where even color could make “low-level” features different (which BTW I haven’t even considered at that point!), to make foundations even stronger!
In summary (and using my own examples): if you have a cat picture classifier, you might use it to train a new dog picture classifier (provided both of them have the same resolution, color, etc.), since the low-level features might end up being very similar (both have 4-legs, both have a head, a snout, etc); however the cat classifier might perform very poorly in a bone-disease classifier transfer learning, since they share almost none “common low-level” features .
I think that pretty much solves this thread.