Using Transfer Learning to deal with Data Mismatch

Hi, I am wondering if Data Distribution Mismatch problems between training and dev/test set can be dealt with using Transfer Learning, and if this is done in practice.

Specifically in the quiz of week 2, we have 900,000 images from the internet, and 100,000 images from the actual cars’ cameras. The answer the quiz provided to dealing with this problem is to :

“Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s front-facing camera. The 20,000 remaining images will be split equally in dev and test sets.”

Can we instead take a model trained using the 900,000 internet images, and then replace the last layer or add on more layers, and train them using the 80,000 actual car images?

On a more abstract level, I am thinking that transfer learning hints that the sequence of data with different distributions being ingested into the model affects model’s performance. i.e. perhaps if we train a model starting with the least relevant data and finish with the most relevant data, the performance might be better? Is this thinking valid? Has this been studied before? Would greatly appreciate if you could share relevant materials on this.

Thank you!

Yes, I believe transfer learning as you described it can be a valid way to attack this problem. I don’t know how to tell if it would perform better than the model trained as described in the quiz, however. Maybe we just have to try and see.

The idea of using specific kinds of data in an intentional sequence in the train set to improve learning sounds like the idea behind curriculum learning.