Results evaluation UNQ_C3

Sebastian_Miranda · March 14, 2024, 10:59am

Hi,
I just wanted to better understand the results that I got in this case.

I ended up getting a better result in the test set than in the validation set. Is that okay? During an exercise that I was doing on my own, using the F1_score as a metric, in one case I got better results in the validation set than in the training set. What does that say about my model?

Thank you

Deepti_Prasad · March 14, 2024, 11:36am

Can you give information about ratio between train: validation:test dataset ???

Sebastian_Miranda · March 14, 2024, 1:11pm

The train and validation have been split with a ratio of 75/25. X_train shape: (5147, 18) and the test set is from other csv file with shape (1716,). In the particular case of the assignment.

Deepti_Prasad · March 14, 2024, 3:00pm

Does your validation and test dataset ratio match?

Hello @Sebastian_Miranda

This assignment is not from course right??

Regards
DP

Sebastian_Miranda · March 14, 2024, 3:24pm

Yes is from the course C2_W2_Assignment.
Val and test dataset match. Testshape: (1716,) & Validshape (1716,)

Deepti_Prasad · March 14, 2024, 3:45pm

Sorry @Sebastian_Miranda, the above statement confused me.

Okay now comes to your query, based on your result your training C Index gave 85% where as test gave 70%

If you check Exercise 3 - random_forest_grid_search, highlighted statement

Based on how hyper parameters were set you have got the result for training and validation set.

Now for test set to, the same condition has been stated

Overfitting could have been issue if the test set had similar result as training set, but your validation and test set not vary much and the result you got is actually perfectly right based on assignment instruction.

But probably I don’t know if you noticed 4. Random Forest section, there is a statement mentioned

Training a random forest with the default hyperparameters results in a model that has better predictive performance than individual decision trees as in the previous section, but this model is overfitting.

So it is actual true the model is overfitting and the task here to learn to minimise overfitting. Check the below statement from assignment.

We therefore need to tune (or optimize) the hyperparameters, to find a model that both has good predictive performance and minimizes overfitting.

Regards
DP

Sebastian_Miranda · March 14, 2024, 4:14pm

Deepti_Prasad
Thanks for your answer.
Leaving aside the objective of the assignment, what I would like to know is what conclusions I can make from the model if I obtain the following results.

Case 1: The performance using the metric, for example f1_score, of the model in the test set is higher than that in the validation set.
Case2: The performance of the model in the validation set is greater than during training.

Thank you very much.

TMosh · March 14, 2024, 4:31pm

Unless the differences in performance are statistically significant, personally I would not worry about it too much.

Topic		Replies	Views
Test set and Validation set Advanced Learning Algorithms week-3	10	521	January 15, 2023
Over 90% accuracy but wrong predictions AI Discussions ai-discussions	14	986	April 16, 2024
Typo in output for UNQ_C10 AI For Medical Treatment week-1	3	534	January 21, 2023
Train error vs validation error Supervised ML: Regression and Classification week-3	14	352	November 9, 2023
Model Evaluation AI for Medical Diagnosis week-2	4	586	October 23, 2021

Results evaluation UNQ_C3

Related topics