- can some one tell me what he was trying to explain during this slide. I never got what was the problem with using test data for choosing the model. I also didn’t understand what he was trying to explain us with the term optimistic error.
Thank you…
Thank you…
The concept here is that you can make the model more complex by using additional polynomial terms. The degree of the polynomial is ‘d’.
However, this can lead to overfitting the training set.
To decide what polynomial degree is best, you try a lot of different values, train using each one, and choose the one that gives the lowest cost when evaluated using a test set.
Yea I understand this part, but he said the lowest value among these test errors is likely to lower than the actual generalization error and thus this method is flawed. Not quite understand this.
Please give the video title and time mark where he says this.
week3 - Model selection and training/cross validation/test sets - 3:45, thank you
I know there’are some statistic stuff… I haven’t learned the cross-validation, so if you have any books recomended I would be appreciate
The validation set can be used to adjust the model, then a test set is used to verify the performance of completed model.
I don’t know of any books - the video lectures cover this topic.
And I didn’t got the part where he assumed that the model with polynomial degree of 5 gives the best result with test set. So why doesn’t it mean that it’s the best model. why would he say that the “lowest value among these test errors is likely to lower than the actual generalization error and thus this method is flawed”. Can someone please explain this to me.
Thank you
Hello @Bibek_Joshi
Those models are all evaluated against the same test set, so you can say that is the best model with respect to the test set.
We pick the training parameters with the training set, so the training set error is likely lower than the actual generalization error.
Similarly,
We pick the hyper-parameter d with the test set, so such test set error is likely lower than the actual generalization error.
The idea is, whenever a dataset is used to determine any part of the model, such dataset’s error is more likely to be advantageously biased toward the model and thus less likely to be a good estimate of generalization error.
The best would be a dataset that is not used to pick anything for the model.
Cheers,
Raymond