Bird Recognition in the City of Peacetopia : Citizen Data

storm95 · June 21, 2021, 7:20am

Hi,
To the following question, I am not totally getting why should we add the citizens’ data to the training set. I only see it harming than improving anything. The pictures taken by citizens may be very different from the security cameras. It would improve the prediction if we want to predict on pictures taken by citizens as well, which is not the case here. Also, the only way I see this improving the system is if we add some % of this in training, dev and test set, and not one of them only.

The explanation says that:

“Sometimes we’ll need to train the model on the data that is available, and its distribution may not be the same as the data that will occur in production.” This is true that sometimes we don’t have much choice, but here we have choice of including the citizens’ distribution or not.
“Also, adding training data that differs from the dev set may still help the model improve performance on the dev set.” How will it help? Can you tell me some cases where it will help improve performance on the dev set?

manifest · June 22, 2021, 7:01am

Hey @storm95,

The main intuition here is that adding a considerable number of new training examples, even from a different distribution, may still help learning. We also want our dev and test sets to be close as possible to the true data distribution, because we use these sets to evaluate our model, and we want to evaluate on examples that will come on inference time (e.g. pictures taken by security cameras).

Please remove the screenshot with your quiz answer and notes on explanation to the wrong answer, that’s against the rules.

storm95 · June 22, 2021, 6:17pm

Thanks @manifest . I am unable to find any edit option so that I can delete the screenshot. Can you guide me how to remove it?

Topic		Replies	Views
Confused about the right answer, week1 quiz Structuring Machine Learning Projects week-module-1 , coursera-platform	6	604	May 20, 2024
New 1000 images after model development (train/dev/test), where to add? Structuring Machine Learning Projects coursera-platform	12	717	July 5, 2023
Adding Training data which distribution differs from Dev/Test sets Structuring Machine Learning Projects coursera-platform	16	967	December 9, 2024
Course 3 Week 1 quiz Structuring Machine Learning Projects coursera-platform	1	567	June 25, 2022
Using Transfer Learning to deal with Data Mismatch Structuring Machine Learning Projects coursera-platform	1	560	May 31, 2021

Bird Recognition in the City of Peacetopia : Citizen Data

Related topics