Overfit in boosted trees

Incase of boosted trees,
If we create a new tree to address the misclassifications caused by the other (previous) trees, won’t we run into the problem of overfitting?

1 Like

Hey @Thala,

Indeed if you keep on creating new decision trees which keep on addressing the misclassified samples of the previously trained decision trees, we will eventually overfit, but that’s where the humongous number of hyper-parameters come in. You can hyper-tune these parameters for addressing the issue of over-fitting.

For instance, take a look at the documentation of the XGBRegressor offered by the XGBoost library. You will find that there are more than 20 hyper-parameters, some of which are already set to default values which are found to be the best values for most of the scenarios after running 100’s of experiments, and some of them are left for you to hyper-tune in accordance with the task at your hand. I hope this helps.


I think the Boosted trees concept discussed in the course was merely introductory level.
So the concept of hyper-parameters etc were not covered.
It will be very helpful if you can share with me some other materials/videos which discusses boosted trees in a greater depth​:sweat_smile:

Hello @Thala,

Generally speaking, we can either (1) limit the growth of any tree, or (2) limit the number of trees, for dealing with overfitting. Certainly we can do both.

To achieve (1), for example, we can limit the data visible to a tree - this idea is covered in video; (2) we ask for a minimum gain before allowing a split - we calculated gain in C2 W4 assignment; (3) we hard-code a maximum number of split or/and a maximum tree depth.

To achieve (2), for example, again, we can hard-code the maximum number of trees; we can use early-stopping to ask the tree stop growing when a certain condition is met.

I hope the general idea in above isn’t difficult to get through, but for the details such as what hyper-parameters to tune, you can start reading from here , which will give you some hyperparameters’ names, so that you can further read about their definitions here. I also sometimes read this page again to refresh my memory about what options I have.

Lastly, if you want to know more about a specific hyperparameter, googling the name or “xgboost {the name}”. If you want to read others’ sharing on how to tune hyperparameters, I think you can google a lot of articles online.



Similar to Random forests, we can use a subset of features to split at a node. right?

Yes we can do that.


1 Like