Hi there,
I’m interested in how transfer learning can help models with little data perform better by pre-training with a task that has similar low-level features and a lot more data.
You said that task A and task B should have the same input data X, but what about the output data y?
For example, suppose you pre-train a model for image recognition, where the output is a binary classification, and then fine-tune it for radiology diagrams, where the output is a multi-classifier for different conditions detected in the diagrams. (I’m not a radiologist, so please excuse any radiology inaccuracies!)
In this case, the training data would have 10 outputs of different diagnoses, which is different from the single output from the image classification.
My guess is that the radiology model would still derive benefit from pre-training, because the low-level features that are pre-trained from the image recognition would be useful for both a binary and a multi-class classifier, such as edge detection. But I also wonder if there would be more benefit from going from binary to binary than from binary to multi-class.
I would love to hear your thoughts on this.
Thanks!