Why we split the our data into training and testinb/dev sets?

Ahmad_Khalid1 · August 24, 2023, 4:41am

Why is ratio is large for training example?
Please someone explain to me I am new here to learn deep learning.

Muhammad_Shahrose_Kh · August 24, 2023, 5:01am

Let assume you want to teach a kid how the letter ‘A’ looks so you show him numerous times that this is how you draw ‘A’ or this is how an ‘A’ looks like. Means you have larger set to train i.e. you write ‘A’ in front of that kid (it is large training set split). Then to evaluate (is kid learned that?) you can show him only 10 or 20 drawings of ‘A’ to guess did the kid understand well…

Ahmad_Khalid1 · August 24, 2023, 8:58am

Thanks bruh you explained with good example.

elirod · August 24, 2023, 11:22am

Hi @Ahmad_Khalid1

When you’re training a machine learning or deep learning model, you need to ensure that your model is learning from a diverse range of examples and that it can generalize well to new, unseen data. This is where data splitting comes in.

The training set is the portion of your data that you use to actually train your model. It’s like a teacher showing examples to a student. The model learns from these examples by adjusting its internal parameters to fit the patterns and relationships in the data. A larger training set is often preferred because it provides more diverse examples, helping the model learn better.

The testing (or development) set is a separate portion of your data that you don’t use during training. It’s like a final exam for the student. After your model has learned from the training data, you use the testing set to evaluate how well it can generalize to new, unseen examples. This step is crucial to assess the model’s performance on data it hasn’t seen before. The testing set helps you gauge how well your model is likely to perform in real-world scenarios.

Why Split Data?

Performance Evaluation: By evaluating the model on a separate testing set, you get an unbiased estimate of its performance. This gives you a sense of how well your model is doing and helps you make improvements if needed.
Preventing Overfitting: If you evaluate your model on the same data you used for training, it might seem like it’s performing well, but it could simply be memorizing the examples. The separate testing set helps you catch overfitting, where the model doesn’t generalize well and instead memorizes the training data.

In summary, data splitting into training and testing/development sets is a foundational practice in machine learning and deep learning. It ensures that your model learns from diverse examples and can be evaluated on unseen data, helping you build models that perform well in real-world situations.

Best regards
elirod

Topic		Replies	Views
Fine Line between Training set, Dev set, and Test set Supervised ML: Regression and Classification week-module-3	14	636	June 25, 2023
Train, Dev, and Test sets Structuring Machine Learning Projects coursera-platform	8	2870	July 5, 2025
Difference Train/Dev/Test sets Structuring Machine Learning Projects coursera-platform	7	1467	September 21, 2023
Why do a training set? Structuring Machine Learning Projects coursera-platform	1	528	August 1, 2022
Confusion about Training Set vs. Dev Set Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	877	December 19, 2021

Why we split the our data into training and testinb/dev sets?

Related topics