Test set and Validation set

Hi, I am confused by the usage of validation set.

In the course, the teacher said that validation set are helpful cause it help the selection of each models. My understanding is that after we tune the hyperparameters (or parameters) with the training set, we predict the output with validation set and compare the cost. By the cost we select the best model.

Since in test set and validation set we won’t adjust the parameters anymore, what’s the difference between those two?

P.S. I did check out the previous posts about this topic, yet, still wondering, some of the comments show that the model can still be tuned while it is in validation procedure? If it is true, is it possible that the high-degree model overfits the validation set just like it did with the training set ?

Hi @James_Yu1 Great question

The training dataset is use to train your model while the validation dataset is use to tune the hyperparameters, and the testing data is to control for overfitting.

For example, let’s say you train a neural network on your dataset with the default parameters, after the training is done you predict on the validation dataset and evaluate the results, let’s say you achieve 70% of accuracy on the validation data, now you might think that changing some parameters you are able to improve your results, and change the learning rate from 0.01 to 0.005 achieving 80% of accuracy on the validation data. However, if you do this process to much it is possible to overfit on the validation data as well, so you want to save part of the data to see if your model is able to generalize, if the results on the testing data are similar it could indicate that your model didn’t overfit and the results are generalizable, if not you need to start over.

In summary:

  • Training data: Only for training your model
  • Validation data: Tune the hyperparameters
  • Testing data: To evaluate your model performance on unseen data (you don’t fine tune hyperparameter with this data)

Let me know if this answer your question!

1 Like

Great thanks !

Yet I am still confused. From the example, after we train the model with training sets, we change the hyperparameter ( change the learning rate from 0.01 to 0.005) to try improve the performance on validation sets, such as learning rate. Meanwhile, we still go through several epochs with validation set in order to approach a better accuracy. Which means the validation step will tune the parameters (w,b) as well ? Or the validation step only tune the hyperparameters , then how come changing learning rate could influence the accuracy ?

Now it seems like a method that help us to avoid training the entire training step again, also we can do some minor changes to boost the performance. And it works just before the model and the hyperparameters overfit the validation set.

Yes. So, the iterations we made to our algorithm are meant to choose the best parameters to make our prediction, so the weights and bias change by updating the hyperparameters, in the case of the learning rate is a measure of how big the steps are to find the best model.

This picture is from a project that I was recently working on:

image

On the y-axes are the accuracy and the x-axes is the number of epochs or training rounds, each of the lines represent a different learning rate, as you can see the accuracy on the validation and training are different, check for instance the learning rate 0.1, on the training data shows a good performance, but on the validation data shows an accuracy of 0.5 which means that the algorithm is not able to discriminate my validation examples (overfit).

After all iterations I choose the best hyperparameters on the validation data and evaluate how close they are to the testing data, but each iteration does have to perform the full training round, that’s why I always start with small epochs and at the end, once I have a good model of the validation I re-train the model with more epochs.

Please let me know if this clarifies your question

1 Like

Please correct my understanding if I am wrong.

In this graph, the x-axis represents the epoch, which means that for every epoch, you try yo get the accuracy from both training set and validation set through the model.predict method.

Within 10 epoch, we can discover that some hyperparameters lead to overfitting and some lead to under-fitting, we eliminate those model then train those who remain with larger epoch.

After the large epoch, we finally get the well-trained model we want, then we test it with the test set data.

Are these description correct ?

This was only for the learning rate and keeping the rest of the hyperparameters constant, so I want to find the best learning rate for my data, the one I choose was 0.001 since it achieves a higher accuracy in the validation data, after this process is made I can look for tune other hyperparameters, but overall the idea of this is that I use the validation data to make decision of the hyperparameters and the testing data just to see if my model is able to generalize to unseen data. If after I did all the fine tune of the parameters and the results of the testing data are bad I need to start over since I overfit to the train and testing data.

Please let me know if this answers your question

To clarify: Every time you change a hyperparameter (such as the learning rate), you start over again and train the system anew.

You use the results from the validation set to give you guidance about what change to make to the hyperparameters for the next time you train.

Hi @pastorsoto,
I came to this forum with the exact same question as @James_Yu1 so I will just reply here and not start a new topic. Hope that is ok.

Your answer is already very helpful, thank you. Just one more clarification question:

Training dataset: Only for traininig the model (e.g., w and b in linear regression)

This trained model will then be checked against the validation dataset and I will also use this validation dataset to change the hyperparameters of my model.

When I am done with training (parameters as well as hyperparameters) I check against the test dataset to see “final” performance and asses e.g. over/under fitting.

I think this part is clear (but still please correct if I am wrong).

My question is now:
What do I do, if the performance is bad e.g. after testing against validation dataset or testing dataset?

  1. If the accuracy after the test against validation dataset is bad, I can change the hyperparameters. Do I have any other options? Would I choose a new algorithm as this stage? If I change the hyperparameters I have to retrain my entire model again correct?
  2. If the accuracy is ok and in the end I test against the test dataset, and the accuarcy is then low, what do I do? Start all the way in the beginning? Or do I just go back and change the hyperparameters?

By writing those lines I realize the questions might be not very specific or already partly answered. Sorry for that but this procedure of train, validation, test creates a small mess in my head :slight_smile:

Thank you so much.
Best,

1 Like

Hi @M_R2 Good questions:

There are several scenarios involve in your question

The first thing is that you need to define what good or bad performance is, this can be done by using domain knowledge, previous research or requirements of your project

Let’s explore the case in which your performance on the validation data is bad in previously define terms, in this case you have several options:

  1. Change the hyperparameters

  2. Change the model architecture (instead of logistic regression, use random forest)

  3. Collect more data

  4. Prepare your data with feature engineering

  5. Error analysis to identify where your model is struggling to learn

In the scenario of the performance of your model is bad in the validation data it means that the model is not able to identify patterns to make useful predictions of your data.

On the other hand, if the performance is good in the validation data but bad in the testing data it means a different thing, in this case you overfit your data, meaning that your model learned too much about the training and validation data but is not able to generalize, so in this case you need to start over, check your data and your model to see if you can identify why and use a different model.

In the second course of the specialization you’ll explore how to debug a learning algorithm and some steps that you can take for this problem, in summary it can be something like this:

If the results of the loss in the training set is high compare to a benchmark (e.g. human level performance) and the loss of the validation data is also high, but both are similar, you have a high bias model (underfit).

If your loss in the training set is low compare to a benchmark and the loss of the validation data is high, you have a high variance model (overfit).

In the videos, it shows some examples on how to approach each issue

  • Get more training examples → For High variance (Overfitting)

  • Try smaller sets of features → For High variance (Overfitting)

  • Try getting additional features → High Bias (Underfitting)

  • Try additional polynomial features → High Bias (Underfitting)

  • Try decreasing λ → High Bias (Underfitting)

  • Try increasing λ → High variance (Overfitting)

Let me know if you have more questions

1 Like

Thank you so much! Very helpful :slight_smile:

I think this clears most of the questions up I had.

Comparing the content of the 2nd course to your reply, I get the feeling though, that with the validation set and the test set, I try to accomplish the same thing?

I train the model on the training dataset and will use the validation dataset to test the model on unseen data and adjust the hyperparameters (based on bias/variance).
When I am done, I use the test dataset to (again) check my adjusted model on unseen data.
That is correct?

Yes. The only difference is that you don’t adjust the hyperparameters with the test set, its just to see how the adjusted model would behave in production.

1 Like