I did not understand why professor used cross validation data sets instead of using the test data set ? … I watched the video two time and I did not get it
especially when you have limited data, cross validation is really useful since you basically can test several splits (not just one as w/ one test set) and therefore more variation of data which should usually help to prevent overfitting, see also this thread: Why cross validation - #4 by Christian_Simonis
Is there any specific part which is unclear?
Best regards
Christian
for more clarification : for diagnosing different models he decided to divide the data sets to training sets and test sets and then He claimed that choosing the model depending on the J of (test data sets) will not be good … so he divided the data into three sections training , cross validation and test data again … and He said it will be better to select the model based on the J of cross validation data … I did not get what the difference or what is the impact from using the cross validation data than the test data … eventually they are the same ( data ) ?
- The training set is used to train the model.
- The validation set is used to modify the model’s hyperparameters (such as the number of units, or the regularization value)
- There is an iterative process between training and validation.
- The test set is used as a final check of the model performance, using data that was not involved in creating the model.
Hi @Mohamed_Hussien1 great question!
I just want to add that validation is a fascinating topic in machine learning, mastering this can really make a big difference. My suggestion is to check this chapter in which they address this topic in the context of Machine Learning competitions Designing Good Validation | The Kaggle Book (oreilly.com)
I hope this helps!