Trying out Model selection

rmwkwok · June 29, 2022, 12:17pm

This goes back to the topic - model selection, and K-Fold CV.

A more general definition of model training includes (1) model selection and (2) literally fitting your model candidates with data. (2) uses training set whereas (1) relies the cv set. In this sense, both training and cv set can be seen as your training data because you use both sets to inform your decision makings in order to deliver your final, trained model.

test set, however, represents data in the production stage, so it’s used to assess your final, trained model. Before the assessment, test set is forgotten - we don’t use it in our model selection and model fitting processes. After the assessment and the result is bad, we forget it again - we don’t change the test set to avoid the possibility that the next assessment is improved because of the change in test set. We hope to improve the assessment result by training a better model.

Let’s say you have a fixed cv set (which is the case in your code), and you have N model candidates, and you evaluate each candidate with that one cv set, it’s called the 1-fold cross validation. So, a K-Fold CV means you have K different cv sets. Here’s a way you can generate them:

From your whole dataset, leave out 20% as test set, and remaining 80% as training data.
For 5-fold CV, split your training data into 5 slices. Each time pick one slice as the cv set and the rest as the training set. Train one of your model candidate with the training set and evaluate it with the cv. Repeat this until all slices have been served as cv. Then you get 5 evaluation scores for this candidate, and you may average them to get one final score for the candidate.
Repeat step 2 for all candidates.
Pick the candidate with the best final score.

Your use of generating Polynomial features creates model candidates. Degree 1 is one candidate, degree 2 is another, and so on.

Cheers!

Raymond

Topic		Replies	Views
Questions of C2W3_Lab_01_Model_Evaluation_and_Selection" with sklearn Advanced Learning Algorithms week-3	3	202	March 11, 2024
Feature Engineering and Polynomial Regression - Trouble validating results Supervised ML: Regression and Classification week-2	18	587	June 1, 2023
Evaluation of models Advanced Learning Algorithms week-3	2	468	February 14, 2023
C1W3 exercise 2 Supervised ML: Regression and Classification week-3	2	495	November 8, 2022
C2W3 Lab Qn - Model Evaluation and Selection Advanced Learning Algorithms week-3	12	207	May 10, 2024

Trying out Model selection

Related topics