Cross validation sets

I did not understand why professor used cross validation data sets instead of using the test data set ? … I watched the video two time and I did not get it

Hi @Mohamed_Hussien1

especially when you have limited data, cross validation is really useful since you basically can test several splits (not just one as w/ one test set) and therefore more variation of data which should usually help to prevent overfitting, see also this thread: Why cross validation - #4 by Christian_Simonis

Is there any specific part which is unclear?

Best regards
Christian

for more clarification : for diagnosing different models he decided to divide the data sets to training sets and test sets and then He claimed that choosing the model depending on the J of (test data sets) will not be good … so he divided the data into three sections training , cross validation and test data again … and He said it will be better to select the model based on the J of cross validation data … I did not get what the difference or what is the impact from using the cross validation data than the test data … eventually they are the same ( data ) ?

  • The training set is used to train the model.
  • The validation set is used to modify the model’s hyperparameters (such as the number of units, or the regularization value)
  • There is an iterative process between training and validation.
  • The test set is used as a final check of the model performance, using data that was not involved in creating the model.
2 Likes

Hi @Mohamed_Hussien1 great question!

I just want to add that validation is a fascinating topic in machine learning, mastering this can really make a big difference. My suggestion is to check this chapter in which they address this topic in the context of Machine Learning competitions Designing Good Validation | The Kaggle Book (oreilly.com)

I hope this helps!