Up to my knowledge, in regression ,we first calculate training score, then we calculate cross_val_score to get an idea whether model is overfitting or underfitting, we do hyperparameter tuning and finally calculate test_score.
But do we follow this trend in classification also, as we have various metrics in classification(precision_score, confusion_matrix, etc) , I’ve seen in some of the projects, model is evaluated directly on X_test, in this case how do we know model is overfitting or underfitting ? Are there any fixed steps or is this depends on the size of the data?
And how do we increase or decrease the recall or precision?
Thank you
Yes, we follow the same trend in classification also. This is more of a generic trend, that we follow throughout the entire supervised learning, and not something that is restricted to either classification and/or regression in particular.
Yes, it indeed happens in some of the cases, but is not recommended as such. For instance, if you don’t have a lot of training data, then splitting your data in 3 subsets, namely the train, dev and test might not be possible. Similarly, let’s say that you have a lot of training data (which might not be largely suited for your task), but very few test samples (which are particularly created as per your task). Now, as Prof Andrew discussed in the lecture videos, the dev set should resemble the test set, but we don’t have enough samples, so once again, we might split the data in 2 sets only, train and test.
Now, you might wonder why is this not recommended? Let’s take a hypothetical scenario in which we don’t have any hyper-parameters to tune. Even in this case, using the dev set, you can find out if your algorithm is over-fitting and/or under-fitting, and after that (although you are unable to modify the algorithm), you can definitely modify the dataset, add new samples, add/remove some features, and what not! So, if you have enough data, always prefer to have a dev set!
Now, coming to the next query, i.e., if we don’t have dev set, then
You can use the performance on test set and compare it with the training set performance, and try to evaluate if your algorithm is over-fitting or under-fitting, but when you will adjust the hyper-parameters of your model to perform better on the test set, you are essentially exposing your test set to the model, and now, your test set can’t give you a reliable estimate of your model’s performance on unseen data. So, essentially, you need a new test set and this brings us back to square one, in which we can call the older test set as the dev set.
And as for this, this depends on your model’s performance. As your model’s performance will increase, your precision and recall scores with automatically increase (though not always together but will tend to increase for sure). I hope this helps.
Thank you so much for the explanation, I totally agree with you that if there are only two sets training and test sets ,then we are essentially exposing our test set to the model.
Is cross-validation solution to it i.e. first doing training on train set and comparing scores of training set and and score of cross validation done on training set, tune some hyperparameter and finally compare to test score.
If I talk about classification, up to my knowledge doing evaluations on training data might give us full accuracy so I guess its better to do (again) cross validation on training data, and compare with test set. But how do we find model underfitting and overfitting in the case of classification.
My doubts are mostly related to classification, reason being all of the examples that I have came across, in all of them first training was done on whole training set via cross validation and evaluation was done on test set which is making very confusing as you might have got idea after reading my doubt
Hey @GauravMalik,
Can you do me a favour? I am not really sure by “cross-validation”, you mean K-fold cross-validation or Cross-validation set (also referred to as dev set). Now, both of these are different things, so, can you please update your previous post with “dev” instead of “cross-validation” wherever you are referring to the latter, and with “k-fold cross validation” whenever you are referring to the former?
Ok ,so by crossvalidation set i mean when we perform crossvalidation on training set , i didnt mean crossvalidation set as a different set kept only for validation which we call dev set.
Hey @GauravMalik,
Please allow me some time to reply to this query. I will positively reply to this query by tomorrow. I am just a little busy with some other work tonight.
Hey @GauravMalik,
So, you are essentially referring to performing K-fold cross-validation on the training set, and you have no dev set whatsoever.
Now, coming to the query. Firstly, it’s not true at all that you will get a 100% accuracy on the training set even after training your model on it. Now, you can use the K-fold cross-validation, to find the training scores and cross-validation scores K times, but aggregating them to find out the extent of over-fitting and/or under-fitting is not only non-trivial, but at the same time, your model has seen the data of the K^{th} fold as well and has trained on it, so, would it give you a correct estimate of over-fitting and/or under-fitting, that’s doubtful.
So, having a separate dev set when you have enough samples is always better. You can use K-fold cross-validation as well, but the existence of dev set will give you a pretty much accurate estimate of over-fitting and/or under-fitting. Let me know if this helps.