When to use decision tree?

Hi Sir/Mam,

We need couple of clarifications from the lecture video, When to use decision tree ?

  1. How it could be easier to string together multiple neural networks ? what is the concept behind it ?

  2. Proff andrew told that at a time only one decision tree can be train. In what way the statement has been told ? Because Random forest train multiple decision tress in parallel right if so how ?

Hello @Anbu,

Would you mind sharing at which time in which video from which course we discussed:

  • string together multiple neural networks
  • at a time only one decision tree can be train

I want to make sure we start off on the same page. If you provide them, I am going to watch those parts before getting back to you.

Thank you.
Raymond

Thanks Mentor. Please find the requested details.

https://in.coursera.org/learn/advanced-learning-algorithms/lecture/vh1V7/when-to-use-decision-trees

Machine learning course 2 → Advance Learning Algorithm → week 4 → When to use decision tree lecture video

Hello @Anbu,

Thanks for sharing the source. It’s important for me to understand your question in context. I am quoting from that video the part that is relevant to your first post:

if you’re building a system of multiple machine learning models working together, it might be easier to string together and train multiple neural networks than multiple decision trees. The reasons for this are quite technical and you don’t need to worry about it for the purpose of this course. But it relates to that even when you string together multiple neural networks you can train them all together using gradient descent. Whereas for decision trees you can only train one decision tree at a time. That’s it.

This is how I understand it and it does not represent Andrew’s idea, but still, the focus here is pretty technical. Therefore, you may find it difficult to fully understand if you have not yet built many neural networks before. It is neutral that we want to google more if we are interested in something, so I will provide some googling keywords at the end. If you find what I am going to share hard to believe, then perhaps some researching, learning, and practicing will help.


For neural networks, some people find it useful by stacking two types of neural networks together for one task. Let’s say you have some economics time series data which you want to use to make prediction about the next year’s economy. We can build a model by stacking a “autoencoder” neural network and a “RNN” neural network.

“Autoencoder” and “RNN” are not covered in the MLS. I will very very very very briefly talk about what they can do just so we have some ideas. I don’t mean to provide the full picture of it but I hope it would be sufficient to deliver why they are wanted in my example.

An “autoencoder” NN can compress features. If the economics data has 500 features ( 1. GDP over time, 2. CPI over time, 3. population over time, …), then we can use an autoencoder to compress them to 30 information-rich features.

“RNN” is designed for learning temporal or sequential data.

When combined, we can use the autoencoder to transform a 500 (features) x 3650 (10 years * 365 days) data to a 30 (information-rich features) x 3650 (days) data which is then fed into the RNN for predicting the final desired outcome.

You can train them separately, or together. If you want to train them together, two things are needed when writing code:

  1. connect the output of the encoded layer to the input of the RNN layer. This “connection” only takes one line and is very simple.
  2. modify the cost function so that the total cost is a combination of the cost of the autoencoder and the cost of the RNN. This, again, is not difficult.

Training them together could help directing the autoencoder in favor of what we finally want to predict (next year’s economy).


For decision tree, although you are absolutely right that we can train many decision trees at a time when we are talking about the Random Forest. However, it is not Random Forest that we are considering in this lecture. Andrew talked about stringing models together in a way that when we have 2 models to string together, one model depends on the output of the other model. Therefore, Random Forest, is not what we are thinking in mind.

A closer example would be the Gradient-boosted decision trees (GBDT) model, which is also, like the random forest, an ensemble of decision trees. The key difference is that, the GBDT builds the second tree based on the outcome of the first tree. It is therefore necessary to finish one tree before another tree starts to grow.


Again, don’t worry about it if you can’t follow everything, because afterall they are not covered in the MLS, but since you are asking, these are examples that I can think of to share.

Cheers,
Raymond

Some googling keywords:

“stacking neural networks”, “autoencoder”, “RNN”, “Decision tree: gradient-boosted decision tree vs random forest”, “stacking ensemble”

1 Like

Thanks for detailed explanation… so we can basically we cannot string together decision trees… because in random forest also multiple decision tress can be trained but they are all independent of each other so cannot be string together .
correct sir ?

That’s right. We should not be just thinking about whether we are using many decision trees together, but we need to think about HOW they are used together.

Random forest uses multiple decision trees’ output to make the final decision, however, in addition, GBDT trains a decision tree based on the previous tree. This is the difference.

1 Like