Training and Testing on Different Distributions

Vladimir · April 24, 2021, 9:13am

Hello!

In a video we consider only 2 options to deal with photos from application. However, I’ve noticed the third one: why we can’t apply data set augmentation? What are the drawbacks of this approach in that problem?

Sabs · April 25, 2021, 6:33am

Hi Vladimir
I think data augmentation is definitely a possibility there. However, Prof. Ng probably does not talk about it in that particular video because it was about

training and testing on different distributions

. With data augmentation, we are trying to make the training distribution look like the test distribution, and that would be off-topic.

Vladimir · April 25, 2021, 8:16am

Yes, your point makes sense. But what would be the best choice in real practice?

Sabs · April 26, 2021, 12:47am

I believe we have already augmented the dataset by using the high quality images from the internet. But I assumed you were talking about blurring or somehow distorting these high quality images to make them look like pictures that users would upload. That is definitely worth doing; but is most effective once you actually hit a data mismatch problem and identify what “mismatches” are causing the problem.
To do this, the first step is to build a model as fast as you can ( by training on the high quality images) and perform error analysis. Once you identify a data mismatch problem, [https://www.coursera.org/learn/machine-learning-projects/lecture/biLiy/addressing-data-mismatch](Week 2 Video 3) discusses how to address this problem. Prof. Ng proposes a systematic, manual error analysis procedure to identify what kind of data the model is getting wrong. Then, you can augment the training data with more confidence.

,

Topic		Replies	Views
Using Transfer Learning to deal with Data Mismatch Structuring Machine Learning Projects coursera-platform	1	560	May 31, 2021
Exploring augmentation with horses vs. humans Convolutional Neural Networks in TensorFlow week-module-2	1	513	December 7, 2022
Questions about different definition Structuring Machine Learning Projects coursera-platform	2	544	October 2, 2022
Conflicts in Course3 W1 quiz Q14 Structuring Machine Learning Projects coursera-platform	2	616	July 26, 2023
Confused about the right answer, week1 quiz Structuring Machine Learning Projects week-module-1 , coursera-platform	6	612	May 20, 2024

Training and Testing on Different Distributions

Related topics