I think model selection should be done after the ‘test error’ because our essential goal is to get the lowest generalization error so it only finds after the test error is calculated. because if we made a model selection based on a cross-validation set and chose the lowest cv model then maybe the other models are good at test data or have low generalization error.
The other question is what is the main difference c between CV and test data we do the same thing with CV and test data which is only prediction on data but not training …
The cross validation data is used during the training phase of the model to provide an unbiased evaluation of the model’s performance and to fine-tune the model’s parameters.
Where as the test data is used after the model has been fully trained to assess the model’s performance on completely unseen data.
In the model evaluation and selection lab, we only train the (Training data) means we use model.fit() and also model.predict() for only training data but not for (CV data). for (CV and Test data) we only use model.predict()…so you say cv data is used during training phase.
please help and clarify for me. thanks
If you have done tensorflow course, you will come along assignment or notebook, where first training-validation set are defined, then a model architecture is made and that algorithm is trained. Basically Cross-validation dataset is used to create a best model algorithm, so that the test dataset can be trained on that model.
So cross validation is during the training phase of the model to provide an unbiased evaluation of the model’s performance and to fine-tune the model’s parameters.
This is an optional lab.
I mean we do the same thing with cv and testing data which is only prediction so why we choose these two cv & test we may use only one set for selection.
That is a good question and I know most of the time, this is the confusion comes related to cross validation and test dataset. But if you are thinking both data are used only to get the prediction, then this is a bit incomplete understanding.
Cross validation data is basically used to check how really good is your model in relation to training data where as test data usually check how the model would perform which has been trained on a cv data.
CV data gives unbiased evaluation of the model’s performance and to fine-tune the model’s parameters where the test dataset is used after the model has been fully trained (this fully trained means evaluation i.r.t. to cv data) to assess the model’s performance on completely unseen data.
Just imagine in general terms, you prefer taking a quality product like an iPhone from an apple showroom than from any normal shop. Why? Because one would think people at the apple showroom would have better knowledge about what they are selling and you would get a quality product.
CV data does something like same with the training dataset than compare to the test dataset. Like how Tom mentioned in CV data we optimise the model in a way to get as best as model can perform but where as test dataset is used on training dataset which has been performed or checked with this CV data for creating the most robust model performance.
So CV data is like food inspector for a restaurant and customers are like test dataset and both are basically try and testing the food at a restaurant.
I have given you two examples, hope you understood now.
We make the same measurement, but the data sets have been handled differently. The validation set was used to optimize the model. The test is is brand-new and never touched the training or optimization process. So the test se simulates how the completed model will work on practicall new data.
What’s the difference between training and adjusting model parameters? Do you mean hyper-parameters that you set manually or via some GridSearch etc. ?
As Tom mentioned we are not suppose to use CV data for training, so no we do not use CV data to adjust model weights as well- fine tune.
Your second question
Is again No, as training data is used to adjust model weights or fine tune your model.
We adjust model parameters (eg. - hyperparameters) and fine tuning model using training data.
Training data is a subset of the dataset used to build predictive models.
Validation data(CV) is a subset of the dataset used to assess the performance of model built in the training phase.
Test dataset or unseen examples is a subset of the dataset to assess the likely future performance of a model(this model is created using training data).