Why do we need to have the dataset from same distribution?

mo1994 · June 2, 2024, 7:18am

In “dataset for RL training” its stated that we need to have the dataset from same distribution at the end of the video.

My question is why? if its a summarization task, can’t we have it sourced from different sources with different distribution or topics? or i got the phrase wrong?

TMosh · June 2, 2024, 11:40pm

Models are trained on the characteristics of the training set.
So it’s important that the training data resemble the data you’ll find in real use of the model later. Otherwise you will not get good results when using the model.

Topic		Replies	Views
The consequence of different distribution in train dev and test Structuring Machine Learning Projects coursera-platform	1	798	May 22, 2021
Adding Training data which distribution differs from Dev/Test sets Structuring Machine Learning Projects coursera-platform	16	1012	December 9, 2024
Do we need training and dev/test data to come of the same distribution? Structuring Machine Learning Projects coursera-platform	2	729	May 5, 2022
DLS 3 W1 Train/Dev/Test Distributions Structuring Machine Learning Projects coursera-platform	5	570	November 29, 2022
Week 2, quiz answer not clear Structuring Machine Learning Projects week-module-2 , ai-discussions	1	33	August 13, 2025

Why do we need to have the dataset from same distribution?

Related topics