Data Set Questions

Chiang_Yuhan · March 7, 2024, 8:15am

Hello!

I still have some questions about training set, validation set, and test set after the DSL course and some hands on experience. Here’s my understanding of it and I want to make sure if I don’t make any misunderstanding! The following are my takeaways from the course and my experience, please tell me if I got anything wrong. Please help me understand this topic a bit better!

Training set would change the parameters of my network when it fits the dataset, and validation set represents an unbiased evaluation of the network because it cannot change the parameters of the network. (I don’t know what cross-validation between models means though)

Test set is similar to the validation set but the NN doesn’t use it in training process. It is used for final evaluation of the performance of the network.

Sometimes not having a test set is okay because the validation set kind of plays a part of the test set as they are from the same distribution and are only used for model evaluation.

Thank you in advance,
Yuhan Chiang

dan_herman · March 7, 2024, 2:04pm

Hi Chiang_Yugan,

I’ll try to answer some of your questions:

“Training set would change the parameters of my network when it fits the dataset”

Incorrect: Training set does not change parameters, it’s simply the data that’s used to train the model. The only thing that changes during training are the model weights (coefficients).

Notes: Parameter tuning is an engineering exercise that is an iterative process. You would first start with a benchmark model, evaluate accuracy and tune different parameters to reduce overfitting.

For example: Setting the number of perceptrons in a dense layer or choosing a loss function, adjusting the learning rate, are all examples of hyperparameter tuning.

There are a number of different strategies and software tools designed specifically to reduce overfitting. Once you start adjusting the parameters, you will notice changes and make improvements.

“validation set represents an unbiased evaluation of the network because it cannot change the parameters of the network”

Incorrect: Validation set is used during training to inform the model weights. If there is bias (high variance) in the data, you will get low accuracy.

“I don’t know what cross-validation between models means though”

Cross validation is a resampling procedure designed to observe how a model will generalize on independent data set

“Test set is similar to the validation set but the NN doesn’t use it in training process. It is used for final evaluation of the performance of the network.”

Correct

“Sometimes not having a test set is okay”

Incorrect: Test set is data that the model has not seen during training and is required to test accuracy of the model.

Deepti_Prasad · March 7, 2024, 7:56pm

Hello @Chiang_Yuhan

Adding to what other mentor explained, kindly refer the below post thread which explains briefly about cross-validation and test data difference.

Feel free to ask any doubts

Regards
DP

Chiang_Yuhan · March 8, 2024, 2:40am

Thank both of you for your replies! It clarifies a lot of things in my head now! But I still have some questions regarding this topic.

According to the video in DLS course, I am tackling a problem that requires me to perform an analysis on different subsets (or sources) of the same kind of data.(Kind of like the example of web-page cats (large set) and consumer camera cats(small set)). Is it sensible for me to do the following:

Step 1. Inject some consumer cats examples to my training set which is primarily web-paged cats
– so that the NN can learn with a bigger dataset to achieve higher accuracy

Step 2. Use a higher percentage of consumer cat in my validation and test set

Thanks,
Yuhan Chiang

Topic		Replies	Views
Test set and Validation set Advanced Learning Algorithms week-module-3	10	533	January 15, 2023
Why do we need to have a validation set for training? Advanced Learning Algorithms week-module-3	17	1028	February 8, 2023
Fine Line between Training set, Dev set, and Test set Supervised ML: Regression and Classification week-module-3	14	587	June 25, 2023
Model Selection based on CV or Test & Diff b/w CV and Test data Advanced Learning Algorithms week-module-3	17	481	December 6, 2023
Cross validation sets Advanced Learning Algorithms week-module-3	4	425	July 16, 2023

Data Set Questions

Related topics