Creating and randomizing training, dev, and test data sets

paulinpaloalto · October 31, 2021, 3:51pm

Prof Ng covers these points in quite a bit of detail in Week 1 of Course 2. I’ll give just a high level summary and then you should definitely proceed through Course 2 and hear the full explanation from Prof Ng.

The idea is that the three datasets are for different purposes:

You always use the training data for the training phase, but the “dev” and “test” sets are used for different purposes. You train with the training set and then use the “dev” set to evaluate whether the hyperparameters you have chosen are good or not. That includes everything from the network architecture (number of layers, number of neurons, activation functions …) to the number of iterations, learning rate, regularization parameters and so forth. That means you do training with the training data and then evaluate the accuracy on the dev set in this phase.

Once you have used the training set and dev set to select what you believe are the best choices for the hyperparameters, then you then finally evaluate the performance of that “final” model on the test data. The point being that you want the final test to use data that was not involved in any aspect of the training up to that point, so that you get a fair picture of the performance on general input data.

Topic		Replies	Views
Week 1: Train / Dev / Test video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	382	August 8, 2024
Module1, Setting Up your Goal: Is one test set sufficient for an adequate model performance estimation? Structuring Machine Learning Projects coursera-platform	11	541	March 29, 2023
Confusion about Training Set vs. Dev Set Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	861	December 19, 2021
A few theoretical/practical questions related to structuring projects Structuring Machine Learning Projects week-module-3 , coursera-platform	14	229	March 29, 2024
Week1 Lecture1 Query regarding the point mentioned at time 10.30. Train/dev sets Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	537	March 8, 2022

Creating and randomizing training, dev, and test data sets

Related topics