Module1, Setting Up your Goal: Is one test set sufficient for an adequate model performance estimation?

paulinpaloalto · March 29, 2023, 5:59pm

Thanks for the detailed response. You have described lots of great ideas. I don’t think Prof Ng mentions the concept of stratified split anywhere, but I’m pretty sure he does discuss the ideas around balanced datasets somewhere in Course 3. That is the number of samples you have for each of the possible label types in your data.

You may be working a bit too hard in the stratification case, in that if you have a large dataset and randomly select a non-trivial subset of it, then you would naturally expect that the statistical distribution of the label classes in your selected subset is very close to that of the total dataset. If it’s not, then doesn’t that simply mean that either your random sorting algorithm is not really random or your subset size is too small? But it is a good point that it worth analyzing the distribution of the label types in your various selected random subsets to make sure they are reasonable. And it may well be that even if you have some classes that are underrepresented in the overall dataset that you may get better behavior if you include more of them in the test set, as long as you can achieve that without depleting that class too severely in the training set.

But the overall conclusion seems to be that you are well equipped to handle all these issues when it comes time to tackle a serious real world problem!

Topic		Replies	Views
Creating and randomizing training, dev, and test data sets AI Discussions	11	219	March 29, 2023
Train, Dev, and Test sets Structuring Machine Learning Projects coursera-platform	8	2776	July 5, 2025
Fine Line between Training set, Dev set, and Test set Supervised ML: Regression and Classification week-module-3	14	625	June 25, 2023
Difference Train/Dev/Test sets Structuring Machine Learning Projects coursera-platform	7	1402	September 21, 2023
Week 1: Train / Dev / Test video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	382	August 8, 2024

Module1, Setting Up your Goal: Is one test set sufficient for an adequate model performance estimation?

Related topics