Different from the decision tree, random forest and xgboost build little trees first and then have the final estimates by combining the information gained from every single tree. You can plot the final estimation of the decision tree using plot_tree from root throughout to end leaf. However, you can only visualize a single tree in ensemble tree models plot_tree(X, num_trees=n). I was wondering if there is a method to construct this final estimation of the tree for ensemble models such as random forest and xgboost. If no codes are available for the task, plausible ideas are welcomed.
Hi @zeno3175 I have not made either random forest or xgboost from scratch. this is a blog in which author has made random forest from scratch in python.
Thanks for the code @Moaz_Elesawey, but take a close look: the poster defines # Extract single tree: estimator = model.estimators_[5] before plotting export_graphviz . So it is still a plot of a single tree instead of the final estimation
@zeno3175
you’re right but you can just loop for each tree in the ensemble and visualize it. by the way, visualizing the trees is not very helpful especially if your data contains a lot of features and you have a big depth of the tree you will find interpreting the tree results from the graph is quite a hard thing to do. but it’s very helpful if you are trying to explain the tree to non-technical users.
Though I have a lot of features in my training, in the end, the feature importance is only positive for limited features (< 35) after random forest fitting. So I was trying to link the feature importance: model.feature_importances_ to the tree visualization. But failed to do so. Anyway to start from feature importances to build the final estimated tree?
@zeno3175
I see now what you want to do.
what I would do is just run the ensemble on all the features and then using feature_importance_ I will select the most important features and then I will just run a new ensemble using these features only. I believe it might decrease the model’s performance (e.g accuracy score) although it will increase its speed.
@Moaz_Elesawey I found something in R (plot.multi.trees) and it may help with visualization in one. I will test it out first. Hope it will succeed. One question: if the same parameters feed in, will R and python generate the same results?
@zeno3175
as Raymond said you cannot visualize the ensemble altogether but you can visualize each estimator in this ensemble alone using the method shown above. but it does not work the way I thought it would work.
this example uses DecisionTreeRegressor on the iris dataset.