Week 2, Transfer Learning

In the transfer learning lecture, Andrew said that we can use an open-source implementation of a neural network along with its weights for our problem. Later he also mentions that a neural network that works well on one vision problem often would work on other vision problems as well. So lets say I train a neural network to recognize apples and oranges. How would that still work if I want to recognize cats and dogs? The weights have been trained for apples and oranges. How does the idea of using a neural network trained for one task to be used in another task work?

You should listen to the lectures all the way through: I’m sure Prof Ng discusses the points you raise in the lectures. You will most likely need to do additional training when you use Transfer Learning. The idea is that a network trained to recognize cats and dogs in images has quite a few layers. In the earlier layers of the network, it is learning to recognize very low level features like edges and curves. Then as you go through the layers, the later layers learn to recognize higher level objects composed from the low level features like a cat’s ear or nose or whiskers. So when you use Transfer Learning, you have some decisions to make about what you need to do to adapt the pre-trained network to your new problem.

In your example, the low level geometric feature layers for detecting fruits are probably still good at recognizing those same low level geometries in general. But the later layers will only be good at recognizing objects that are basically spherical and will be concentrating more on the textures and colors of the surfaces. So if you repurpose that you will at least need to do the following:

  1. Replace the output softmax layer with one that has your new labels (cat, dog, none of the above).
  2. Randomly initialize the weights for your output layer.
  3. Start with the existing weights for the earlier layers and do additional training using your input data with cat and dog labels. But the key point is that you may be able to avoid retraining some of the early layers. You can “freeze” those and then run the training only on the layers past some particular point. But figuring out where that is will require some experimentation. If you want, you can just continue training all the layers. It most likely won’t do any harm to your results to do additional training on the early layers, but it may just be a waste of time and compute resources.
  4. You may find that your results aren’t that great and you need to add more layers to the network because recognizing cats and dogs is more difficult than apples and oranges.

I suggest you listen to the lectures again and I bet you find that Prof Ng actually does say all the above, but probably explains it better than I just did. You’ll also get to see a worked example of how to do this in the MobilNet assignment for this week.