Hi,
What don’t we explore the entire list of features and stop when there is none left to make an additional split?
We only have to explore each feature once, so why not stopping when there is none left?
Thanks,
Laurent.
Hi,
What don’t we explore the entire list of features and stop when there is none left to make an additional split?
We only have to explore each feature once, so why not stopping when there is none left?
Thanks,
Laurent.
Hey @ljb1706,
Well the process of exploring the entire list of features and stopping when there are none left to make an additional split is a valid approach, but it’s not always the most efficient or effective one. I will try to mention some reasons why is that:
Overfitting: Decision trees have the capacity to create complex, deep trees that perfectly fit the training data, but this often leads to poor generalization to unseen data (overfitting). Stopping the tree from growing when there are no features left can result in excessively deep trees that capture noise in the data and perform poorly on new data.
Computational Efficiency: Exploring all features and creating a tree until there are none left can be computationally expensive, especially in cases where there are many features or a large dataset. By setting stopping criteria, you can create smaller, more efficient trees.
Finally Interpretability: Deep decision trees can become very complex and difficult to interpret, which can be a problem when you want to understand the decision-making process of the model.
I hope it’s more clear for you now why using stopping criteria in decision trees is important.
Regards,
Jamal
Using an exhaustive search can be problematic if there are lots of features and you need to do this calculation very often.
Thanks for the detailed answer.