Question regarding a quiz from "Bird recognition case study"

ricky_pii · June 13, 2023, 12:03pm

According to two of the questions, the distribution difference between training set and dev/test set doesn’t matter but the distribution difference between dev set and test set matters.
Can someone explain why?

Also, I wonder if adding samples from different distribution to the training set would even increase the performance of the model, since it has regularizing effect?

Lastly, does “distribution” indicate the same thing as “domain” in Machine Learning?

Thank you in advance.

gent.spah · June 13, 2023, 3:17pm

The distribution it refers simply to the changes in data specifics from one set to the other set, or similarities if you wish within the set and with other sets.

Normally the distribution of train dev test sets should be same because if the ML model learns from one set than it should encounter similar data in other sets too so it can be accurate.

About the domain I am not sure but I think it refers to the specific application of ML model rather than distribution of data that the model learns from or tests on.

Topic		Replies	Views
Training and set distribution clarification. C3 W1 Structuring Machine Learning Projects coursera-platform	7	355	November 16, 2023
The consequence of different distribution in train dev and test Structuring Machine Learning Projects coursera-platform	1	771	May 22, 2021
Do we need training and dev/test data to come of the same distribution? Structuring Machine Learning Projects coursera-platform	2	658	May 5, 2022
Adding Training data which distribution differs from Dev/Test sets Structuring Machine Learning Projects coursera-platform	16	965	December 9, 2024
Week 1: Train / Dev / Test video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	374	August 8, 2024

Question regarding a quiz from "Bird recognition case study"

Related topics