Train_dev_test split doubt

Soubhagya_Ranjan_Das · September 20, 2022, 3:25pm

Doubt over the split for train/dev/test set:

Let’s say I’ve 2 datasets 1 training dataset (100000 examples) and 1 test dataset (10000 examples).
How should we split the training set to get the dev set? Is 20000 records good enough for dev set?

In one of the video, it was mentioned that the dev set and test set should be of same distributions.
So how are we going to achieve the same in this case, considering dev set is coming from a different set and test set is coming from different set.

gent.spah · September 21, 2022, 8:33am

20000 examples for the dev set should be ok I would say.

In order to have the sets have the same distribution you should merge them and shuffle them good enough so all the data is mixed up and then divide for train/dev/test.

Soubhagya_Ranjan_Das · September 21, 2022, 1:01pm

Thanks for the clarification @gent.spah

Topic		Replies	Views
Week 1: train/dev/test split Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	539	December 19, 2022
Week 1: Train / Dev / Test video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	376	August 8, 2024
DLS 3 W1 Train/Dev/Test Distributions Structuring Machine Learning Projects coursera-platform	5	553	November 29, 2022
Data distribution for training-dev set Structuring Machine Learning Projects coursera-platform	2	552	December 29, 2022
Week1 quizz: very confused about train/dev/test set and when to add new data to which set Structuring Machine Learning Projects week-module-1 , coursera-platform	2	397	February 1, 2024

Train_dev_test split doubt

Related topics