I ended up getting a better result in the test set than in the validation set. Is that okay? During an exercise that I was doing on my own, using the F1_score as a metric, in one case I got better results in the validation set than in the training set. What does that say about my model?
The train and validation have been split with a ratio of 75/25. X_train shape: (5147, 18) and the test set is from other csv file with shape (1716,). In the particular case of the assignment.
Overfitting could have been issue if the test set had similar result as training set, but your validation and test set not vary much and the result you got is actually perfectly right based on assignment instruction.
But probably I don’t know if you noticed 4. Random Forest section, there is a statement mentioned
Training a random forest with the default hyperparameters results in a model that has better predictive performance than individual decision trees as in the previous section, but this model is overfitting.
So it is actual true the model is overfitting and the task here to learn to minimise overfitting. Check the below statement from assignment.
We therefore need to tune (or optimize) the hyperparameters, to find a model that both has good predictive performance and minimizes overfitting.
Deepti_Prasad
Thanks for your answer.
Leaving aside the objective of the assignment, what I would like to know is what conclusions I can make from the model if I obtain the following results.
Case 1: The performance using the metric, for example f1_score, of the model in the test set is higher than that in the validation set.
Case2: The performance of the model in the validation set is greater than during training.