Transfer Learning in CNNs

Suppose I train a neural network on 1 million images of vehicles. Suppose the total classes are 1000 like Bicycles, Motorcycles, Cars, Trucks etc. But Buses are not included in these classes.
Now I need to train another model using transfer learning to recognize buses in an image. I have dataset of 500 images containing buses.

  1. If I freeze all convolutional layers of pretrained model then CNN will not be able to learn specific features of a bus and it will misclassify buses and cars or buses and trucks.
  2. If I unfreeze some last convolutional layers then CNN will learn some specific bus features and it will correctly classify.
  3. If I unfreeze majority of the convolutional layers at the end of my pretrained model then CNN will overfit data because dataset is small (500 images).
    Are these 3 statements correct? If not please provide explanation.

Hi @realarslan33 ,

I really like these special cases. From my understanding and experience with CNN models (I’ve developed and trained quite a few from scratch), I would venture to share my thoughts.

But before that, I have to say that this is just speculation as there are many unknowns, like the depth of the CNN, the dimension of the layers, the dense and fully connected layers that you define, not to mention the specifics of the dataset. At any rate, lets venture into speculation world:

  1. If all convolutional layers are frozen and you fine-tune the model with the 500 images of buses, some of the dense layers that the model may have in addition to the convolutional layers may learn some of the features of the buses. However, the final result on the task of identifying buses may still be limited.

  2. If only some of the last convolutional layers are allowed to learn, then the model will learn some of the specific, high level, features of the buses. Remember that the initial layers will learn the low-level features while the last layers will learn the most complex features. So the model in this scenario may learn what makes a bus, a bus, and be more prone to give better results.

  3. If you unfreeze the majority of the convolutional layers at the end, and this depends very much on the depth of your CNN, then more mid- to complex- level of details may be learned, but given that you only have 500 images, the CNN can learn them ‘just too well’ causing overfitting and not being able to generalize on buses very well.

At the end of the day and as indicated above, these are just ‘speculations’ I am making, and only the experimentations you make will tell the truth.

Hope this helps!