C3 Week2 - Transfer Learning. Train/Dev/Test Splits

Hello,
When talking about Transfer Learning, prof mentions in his example something like having 10,000 cases for the task you transfer from (image detection) and only 100 cases for the task you transfer to (X-ray diagnosis).
In a previous video, there was a somewhat similar situation with a cat detector.
He mentioned having 1M professional web images of cats and just 10k images of the blurry cat images from the app you are working on.
In that case, he said the best use for these 10k images is splitting them between Dev and Test sets, so the model can aim for those cases.

In this case of Transfer Learning to X-ray diagnosis, you need to use the X-ray images in the Train set because you need to update the new layers’ weights.

So this small amount (as small as just 100) of X-ray images:

  • Will they all go to the training set, and the result of the training becomes your final model (no Dev or Test sets), because you consider the hyperparams are already fit in the pre-training?
  • Will they be split in the Train/Dev/Test sets and follow the “regular” procedure of hyperparams tuning, Bias-Variance analysis, error analysis.. etc

Thanks!

Hello Albert @albert_c,

I will go for your second option. Actually I think the second option is the standard, and it is even more prominent in the case of small amount of training data (before or after split), because even we are just switching the output layer, that will still be a very large number of trainable weights which is the multiplication of the number of neurons in previous layer and the number of output classes. That number of neurons is usually large for an image recognition model you will transfer from.

The combination of too many trainable weights and too small training set is pretty vulnerable to over-fitting, and by splitting the data, you at least get a sense of how large that problem can be, and then from there you do the usual analysis the lectures delivered.

Cheers,
Raymond

1 Like