If I have used GridSeacrchCV to find the best hyperparameters, and then I add maybe a new hyperparameter that I have not included before, or I tweak some parameters a little more e.g. Gridseachrch was searching the values 3, 5, 7 and now I test manually value 6 by hand and find out it is a better fit then e.g., value 5, do I then have to test all other parameters again against this new value of 6 because with this value there might be other hyperparameter values (combinations) that fit better?
Another example: If I have the paramter âmax_depthâ: [3, 5, 7], and the optimum result was 7, is it in future iterations safe to maybe only look from 5, 7, 9 or could other parameters have such a big influence in combination that 3 might be also possible?
How sensitive are the found best hyperparameters to the change of other hyperparameters? Do I always have to reconsider every single value or is there a moment where I could more or less safly assume a value to be best?
What is the best practice process? Or should I just practically put all hyperparameters in grid search with a huge range and wait a day for it to finish and give me the right parameters for the given feature set?
To begin with, most of the time, a modelâs hyperparameters are coupled with each other. For example, in a decision tree model, a max_depth of 1 will obviously invalidate any max_leaf larger than 2 because we simply canât have more than one split.
Chance is that this new value of 6 will also have some coupling effect with other hyperparameters, so that you might want to test if it makes a difference in existing combinations. I always think it is simply not quite realistic to attempt to find the ultimate best set of hyperparameters, so if your new value of 6 doesnât make a huge difference in performance and if I do not have too much time, then I probably wouldnât do too many more tests. It really just depends on what you want and itâs sometimes very personal.
As I said in the beginning, the hyperparameters are coupled together, so we really canât just say yes or no without inspecting how they couple together. This is actually the thing we need to learn to call ourselves âknowing decision treeâ. What makes the situation more complicated is that, the data itself can influence the coupling, so there is really no rule of thumb for this matter.
For whether it is safe to only look for 5, 7, 9. Again, there is no rule of thumb. If you do so, however, you are assuming that there is like one and only one âglobal optimal hyperparameter configurationâ and that it must be in the range between 5-9 in the dimension of max_depth. This assumption is quite questionable.
I think we are going back to what I have stated at the beginning again. It is our duty to learn how each hyperparameter works. They are just like a group of people and there is dynamics between them and you really need learn them well. Some of them might be more sensitive to the others in some cases of dataset, while some of them might be less.
It is a luxary to try out every configuration because it is about time and computational power. We usually donât have unlimited resources for that, not to mention when the dataset is too large to even just wait for one configuration to finish.
I think the best practice is, you might try a couple of configuration at the beginning to get a sense of the modeling results, from there then you start to tune the hyperparameters by hand. You know, you are going to have to pick one hyperparameter to start tuning with, and this thing is very path-dependent, meaning that you might get to a very similar results even if you start from this or that hyperparameter, but you may miss the sweet configuration if you happen to not start from that hyperparameter. Nobody can tell you which to start with, and itâs all about your experience with that dataset and possibly similar datasets. Once you choose your starting configuration (model), do check out the training and validation scores and see if itâs overfitting and underfitting, then tune hyperparameters that can mitigating overfitting / underfitting to make improvement. It is all by hand, and so we really walk to some destination on foot, instead of randomly picking by grid search.
I remember you have asked a question for MLS Course 3, so I suppose you have finished course 2, but anyway, you can always watch the videos from MLS Course 2 Week 3 which discuss how we iteratively improve our model. The core idea is about tackling overfitting and underfitting. You can use hyperparameter tuning for that purpose, or you may use any other method described in the lectures for it.
Thank you @rmwkwok. Your answers are very helpful to me to understand that process better and get better intuition. Also thanks for reminding me of the MLS Course 2 Week 3. It is one thing to hear/ read about it but another thing to apply it to praxis. With my current little project I am practicing exactly this and I am will gladly go back to the mentioned videos to apply it to a real problem.
Best,