Development Set Question

PirateHunerZoro · October 14, 2021, 2:06pm

Hello, everybody,
I just had a general question regarding the development set. How exactly to you pick it? I understand it must come from the same population distribution as your training set (unlike the potentially optional test set), but how exactly do you “draw” the development set out of the population that you get your training set from? Are both the development sets and training sets just random samples, and the development set is just a lot smaller? I’d appreciate any insight anyone could offer me on this.
Many thanks!

paulinpaloalto · October 14, 2021, 3:17pm

The simplest strategy is that you pool all your labelled input data into one set. Then you randomly shuffle it and select the subsets for training, dev and test. That gives you the highest chance that all three sets are statistically representative (“from the same distribution”). BTW I think you are misinterpreting what Prof Ng said if you got the impression that the test set is optional.

In terms of how to size the various subsets, Prof Ng discusses that in the video and gives you rules of thumb. It depends on the total amount of labelled data you have. If you have a relatively small aggregate dataset (< 10^5), then you typically use something like 60/20/20 or maybe 80/10/10 for training, dev, test. If you have relatively large datasets (> 10^6), then the dev and test sets can be smaller percentages. Please watch the lecture again for more details on the set sizes.

PirateHunerZoro · October 14, 2021, 3:27pm

Thank you for your reply! It was quite helpful. I will rewatch the lecture with this information in mind.

Topic		Replies	Views
Week 1: Train / Dev / Test video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	374	August 8, 2024
Creating and randomizing training, dev, and test data sets AI Discussions	11	123	March 29, 2023
Week 1: train/dev/test split Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	533	December 19, 2022
Week1 Lecture1 Query regarding the point mentioned at time 10.30. Train/dev sets Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	520	March 8, 2022
Train/Dev/Test Sets Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	430	June 27, 2023

Development Set Question

Related topics