Tree Ensembles optional lab - choosing hyperparameters

Hi all,

In doing the tree ensembles optional lab, graphs were made to see the effect on accuracy when varying the min_samples_split and max_depth hyperparameters:


The lab says that the optimal values are:

  • min_samples_split = 10
  • max_depth = 8

However, this is confusing to me. I thought the point bias/variance tradeoff is to choose a value that optimises both training and test performance. So wouldn’t the optimal values here actually be:

  • min_samples_split = 300. Because after 300, any value onwards would choose both the test and training accuracy to go down.
  • max_depth = 16. In this case, any value before 16 is actually suboptimal for maximising training and test accuracy as those values would be lower. The way I see it, any value from 16 onwards would be better than any values before 16.

Thanks for the help.

The Lab’s suggested values are conservative, aimed at providing reasonable performance on a wide range of datasets and designed to avoid overfitting in a generalizable way. These suggestions can be used as a starting point; it’s common in practice to experiment with these values on a specific dataset to find the true optimal settings, as you are doing.

1 Like

I see, is there any uniform criteria to finding the “true optimal settings” or would it just depend on specific needs? I thought the true optimal settings was to balance the loss from training vs test measures.

Finding the “true optimal settings” is iterative and depends on the specific needs of the problem, such as interpretability, computational constraints, or how much overfitting can be tolerated. Observing both training and test curves to balance the loss between the two is fundamentally sound and consistent with best practices for achieving an optimal balance between bias and variance.

While there’s no hard and fast rule, several commonly used criteria and strategies can guide this process. These include validation curve analysis, cross-validation, hyperparameter optimization techniques, regularization and complexity penalties, and balancing bias and variance (generalization error).

Thanks, much appreciated.

1 Like