Hi mentors,
I have a stupid question to ask but I can’t figure out why we need tree ensembles other than single decision tree. According to Andrew, a single decision tree is not robost since small change in training data significantly affects the model’s structure. But let’s say ideally if both the old model & the new one works similarily on test set, why would the difference in model structure matter?
Thanks for your assistance in advance 
and course link in case:
Using multiple decision trees | Coursera
With one tree you only have one path from top to bottom to find an output.
For the ensemble you have many trees each seeing different data and for the output it can be majority voting from trees or average output of all of them (for regression problems).
Thus having many trees instead of one provides more robustness (generalization) and also generally more accuracy.
Thanks for the explaination! I’m now clear with how tree ensemble outperform a single decision tree by training on larger data set.
But still, why can’t we just feed all data to a single decision tree in practice to achieve same level performance with tree ensemble? Is that doing so will introduce high variance to a single decision tree, thus generalization is much more difficult?
Thanks for the time & support 
You have to train, validate and test (with never seen data) so you have to split the dataset in at least 3 portions and then you can validate and test. And it makes more sense to feed as much diverse training data as possible to the model so it can learn more, and that what the forest is trying to do by giving different trees different training data.
Now the time will come to test your model with data that you haven’t even seen yourself.
This is also true. It will overfit.