I didn't understand this part

Bibek_Joshi · December 16, 2023, 7:52pm

can some one tell me what he was trying to explain during this slide. I never got what was the problem with using test data for choosing the model. I also didn’t understand what he was trying to explain us with the term optimistic error.

Thank you…

TMosh · December 16, 2023, 8:06pm

The concept here is that you can make the model more complex by using additional polynomial terms. The degree of the polynomial is ‘d’.

However, this can lead to overfitting the training set.

To decide what polynomial degree is best, you try a lot of different values, train using each one, and choose the one that gives the lowest cost when evaluated using a test set.

Longjin_Che · December 16, 2023, 8:45pm

Yea I understand this part, but he said the lowest value among these test errors is likely to lower than the actual generalization error and thus this method is flawed. Not quite understand this.

TMosh · December 16, 2023, 8:48pm

Please give the video title and time mark where he says this.

Longjin_Che · December 16, 2023, 9:00pm

week3 - Model selection and training/cross validation/test sets - 3:45, thank you

Longjin_Che · December 16, 2023, 9:07pm

I know there’are some statistic stuff… I haven’t learned the cross-validation, so if you have any books recomended I would be appreciate

TMosh · December 17, 2023, 5:39pm

The validation set can be used to adjust the model, then a test set is used to verify the performance of completed model.

I don’t know of any books - the video lectures cover this topic.

Bibek_Joshi · December 20, 2023, 9:43am

And I didn’t got the part where he assumed that the model with polynomial degree of 5 gives the best result with test set. So why doesn’t it mean that it’s the best model. why would he say that the “lowest value among these test errors is likely to lower than the actual generalization error and thus this method is flawed”. Can someone please explain this to me.
Thank you

rmwkwok · December 20, 2023, 10:48am

Hello @Bibek_Joshi

Those models are all evaluated against the same test set, so you can say that is the best model with respect to the test set.

We pick the training parameters with the training set, so the training set error is likely lower than the actual generalization error.

Similarly,

We pick the hyper-parameter d with the test set, so such test set error is likely lower than the actual generalization error.

The idea is, whenever a dataset is used to determine any part of the model, such dataset’s error is more likely to be advantageously biased toward the model and thus less likely to be a good estimate of generalization error.

The best would be a dataset that is not used to pick anything for the model.

Cheers,
Raymond

Topic		Replies	Views
Didn't understand: (Model selection based on degree of polynomial) Overly optimistic regards to generalization error Advanced Learning Algorithms week-module-3	12	815	March 31, 2023
Model selection question Advanced Learning Algorithms week-module-3	5	407	July 3, 2023
About cross validation an test sets Advanced Learning Algorithms week-module-3	1	465	March 12, 2023
Week 3: Choosing a model Advanced Learning Algorithms week-module-3	1	350	August 12, 2023
C2_W3 model selection Advanced Learning Algorithms week-module-3	8	56	April 1, 2025

I didn't understand this part

Related topics