Evaluation of models


I now finished the section about the evaluation of models and bias/variance and have some questions:

What I took away from the sections were the following steps:

  1. build the models and evaluate training/cross-validation errors
  2. adjust model based on bias/variance (e.g. adjust polynomials, amount of data, etc)
  3. tuning regularisation
    (4. Precision/Recall Trade-Off)

Now I wonder - in that labs we always first chose a model based off of our results in step 1 and then compared differently adjusted and regularised models. But could it not be that a worse-performing model would end up outperforming the chosen model after implementing adjustments for bias/variance and regularisation?

Or in other words - would best practice be going through every single step with every single model and then deciding on what model to choose? If that is the case - is that feasible? We learned that there are a lot of steps involved in adjusting for high bias and high variance (e.g. accumulating more training data etc) which might take a lot of time. As every model possibly has different problem-areas it would take enormous amounts of time to optimise every single model and then choose the best one - that surely could be the way to go but I want to ascertain myself that this is what top-notch ML practitioners do.
Or is there a different way to approach this? Or is the way the labs approach it the “correct” way and it is common practice to first choose a model based on the training/cross-validation errors and only then start to adjust the model on the other parameters?

Your three steps are correct, but they are an iterative process. The goal is to get the best combination of model complexity and regularization.

If you don’t get good enough results, then you may need an entirely different model.

The key is to stop when you get “good enough” results for the problem you’re solving - not necessarily the globally “best” solution.

1 Like

Hello @Niclas_B,

While Tom has elaborated the idea behind this iterative process, I want to address your concern from another angle.

I agree with your concern because I also think that it is a very path-dependent thing. We did not know how many paths will lead us to that more ideal solution, and we cannot guarantee we are always on the right path. However, one thing for certain is that, we have decisions to make, and we can either make it informed, or uninformed.

Your step 1 and 2 give us an informed approach. For “uninformed”, it can be that you randomly test a managable number of hyperparameter configurations, and based on which you start the informed iterative process. On top of these, there is still your own experience in hyperparameter tuning which can help you rank all possible “paths” presented in front of you.

Moreover, it is not just your experience can help, but also more advanced technique of regularization can. Let me quote from Andrew’s The Batch.

When supervised deep learning was at an earlier stage of development, experienced hyperparameter tuners could get much better results than less-experienced ones. We had to pick the neural network architecture, regularization method, learning rate, schedule for decreasing the learning rate, mini-batch size, momentum, random weight initialization method, and so on. Picking well made a huge difference in the algorithm’s convergence speed and final performance.
Thanks to research progress over the past decade, we now have more robust optimization algorithms like Adam, better neural network architectures, and more systematic guidance for default choices of many other hyperparameters, making it easier to get good results. I suspect that scaling up neural networks — these days, I don’t hesitate to train a 20 million-plus parameter network (like ResNet-50) even if I have only 100 training examples — has also made them more robust. In contrast, if you’re training a 1,000-parameter network on 100 examples, every parameter matters much more, so tuning needs to be done much more carefully.

Of course these research results might not be addressing exactly the kind of problems you are facing - e.g. maybe you are not doing image recognication. However, the message behind is, if you want to seek for a more robust way to tune hyperparameters, besides carrying out the 3 steps carefully, consider advanced optimization and regularization techniques. The MLS is introductory, so it only covers the basics.