Random Forest & XGBoost: visualize the final estimate of tree

zeno3175 · October 28, 2022, 12:49pm

Hi,

Different from the decision tree, random forest and xgboost build little trees first and then have the final estimates by combining the information gained from every single tree. You can plot the final estimation of the decision tree using plot_tree from root throughout to end leaf. However, you can only visualize a single tree in ensemble tree models plot_tree(X, num_trees=n). I was wondering if there is a method to construct this final estimation of the tree for ensemble models such as random forest and xgboost. If no codes are available for the task, plausible ideas are welcomed.

Best,
QZ

ritik5 · October 28, 2022, 2:43pm

Hi @zeno3175 I have not made either random forest or xgboost from scratch. this is a blog in which author has made random forest from scratch in python.

Moaz_Elesawey · October 28, 2022, 2:52pm

Hi, @zeno3175

If you like to visualize your decision trees, either random or boosting you can use the export_graphvis from sklearn.tree. you can see this post here

some limitation

that method can only be applied to sklearn builtin tree or ensemble classifiers.
the blog post above is quite old so some of the function parameters might have changed.

1_IPLwmH-TJRhEWXW7uaetMw4000×3867 1.32 MB

zeno3175 · October 28, 2022, 3:40pm

Thanks for the code @Moaz_Elesawey, but take a close look: the poster defines # Extract single tree: estimator = model.estimators_[5] before plotting export_graphviz . So it is still a plot of a single tree instead of the final estimation

Moaz_Elesawey · October 28, 2022, 3:47pm

@zeno3175
you’re right but you can just loop for each tree in the ensemble and visualize it. by the way, visualizing the trees is not very helpful especially if your data contains a lot of features and you have a big depth of the tree you will find interpreting the tree results from the graph is quite a hard thing to do. but it’s very helpful if you are trying to explain the tree to non-technical users.

zeno3175 · October 28, 2022, 4:07pm

Though I have a lot of features in my training, in the end, the feature importance is only positive for limited features (< 35) after random forest fitting. So I was trying to link the feature importance: model.feature_importances_ to the tree visualization. But failed to do so. Anyway to start from feature importances to build the final estimated tree?

Moaz_Elesawey · October 28, 2022, 4:14pm

@zeno3175
I see now what you want to do.
what I would do is just run the ensemble on all the features and then using feature_importance_ I will select the most important features and then I will just run a new ensemble using these features only. I believe it might decrease the model’s performance (e.g accuracy score) although it will increase its speed.

zeno3175 · October 28, 2022, 4:30pm

@Moaz_Elesawey But how does that help with the visualization of showing how the model split the data according to those considered important features?

rmwkwok · October 28, 2022, 4:34pm

I don’t think we can plot one tree diagram to completely describe how a tree ensemble works.

zeno3175 · October 28, 2022, 4:58pm

@Moaz_Elesawey I found something in R (plot.multi.trees) and it may help with visualization in one. I will test it out first. Hope it will succeed. One question: if the same parameters feed in, will R and python generate the same results?

Moaz_Elesawey · October 28, 2022, 5:00pm

@zeno3175
as Raymond said you cannot visualize the ensemble altogether but you can visualize each estimator in this ensemble alone using the method shown above. but it does not work the way I thought it would work.

this example uses DecisionTreeRegressor on the iris dataset.

estimator 0
estimator_0
estimator 1

estimator 5

estimator 6

Moaz_Elesawey · October 28, 2022, 5:02pm

they should do that it just differs in the random_state and how each language computes random variables.

but both of them use cblas in the backend and they should give the same results.

rmwkwok · October 28, 2022, 5:06pm

Please do share your finding with us. I think it is not for random forest, but looks like it can do something for gradient boosted trees.

Atif_Azad · October 31, 2022, 2:28pm

For ensemble based methods instance based explanation of the decisions made is the way to go and comes under explainable AI.

For random forest see: View article

Topic		Replies	Views
Regression Trees ensemble Advanced Learning Algorithms week-4	3	585	January 3, 2023
Decision trees based models Advanced Learning Algorithms week-4	1	504	August 7, 2022
Multiclass classification using tree ensembles (random forest, XGBoost, et cetera) Advanced Learning Algorithms week-4	1	514	September 30, 2022
C2_W4_Lab_02_Tree_Ensemble - Different outputs for same parameters? Advanced Learning Algorithms week-4	1	477	May 8, 2023
Quiz question refers to random forest but answers don't include the main idea Advanced Learning Algorithms week-4	16	894	June 23, 2022

Random Forest & XGBoost: visualize the final estimate of tree

Related topics