Hello @Anbu,
Thanks for sharing the source. It’s important for me to understand your question in context. I am quoting from that video the part that is relevant to your first post:
if you’re building a system of multiple machine learning models working together, it might be easier to string together and train multiple neural networks than multiple decision trees. The reasons for this are quite technical and you don’t need to worry about it for the purpose of this course. But it relates to that even when you string together multiple neural networks you can train them all together using gradient descent. Whereas for decision trees you can only train one decision tree at a time. That’s it.
This is how I understand it and it does not represent Andrew’s idea, but still, the focus here is pretty technical. Therefore, you may find it difficult to fully understand if you have not yet built many neural networks before. It is neutral that we want to google more if we are interested in something, so I will provide some googling keywords at the end. If you find what I am going to share hard to believe, then perhaps some researching, learning, and practicing will help.
For neural networks, some people find it useful by stacking two types of neural networks together for one task. Let’s say you have some economics time series data which you want to use to make prediction about the next year’s economy. We can build a model by stacking a “autoencoder” neural network and a “RNN” neural network.
“Autoencoder” and “RNN” are not covered in the MLS. I will very very very very briefly talk about what they can do just so we have some ideas. I don’t mean to provide the full picture of it but I hope it would be sufficient to deliver why they are wanted in my example.
An “autoencoder” NN can compress features. If the economics data has 500 features ( 1. GDP over time, 2. CPI over time, 3. population over time, …), then we can use an autoencoder to compress them to 30 information-rich features.
“RNN” is designed for learning temporal or sequential data.
When combined, we can use the autoencoder to transform a 500 (features) x 3650 (10 years * 365 days) data to a 30 (information-rich features) x 3650 (days) data which is then fed into the RNN for predicting the final desired outcome.
You can train them separately, or together. If you want to train them together, two things are needed when writing code:
- connect the output of the encoded layer to the input of the RNN layer. This “connection” only takes one line and is very simple.
- modify the cost function so that the total cost is a combination of the cost of the autoencoder and the cost of the RNN. This, again, is not difficult.
Training them together could help directing the autoencoder in favor of what we finally want to predict (next year’s economy).
For decision tree, although you are absolutely right that we can train many decision trees at a time when we are talking about the Random Forest. However, it is not Random Forest that we are considering in this lecture. Andrew talked about stringing models together in a way that when we have 2 models to string together, one model depends on the output of the other model. Therefore, Random Forest, is not what we are thinking in mind.
A closer example would be the Gradient-boosted decision trees (GBDT) model, which is also, like the random forest, an ensemble of decision trees. The key difference is that, the GBDT builds the second tree based on the outcome of the first tree. It is therefore necessary to finish one tree before another tree starts to grow.
Again, don’t worry about it if you can’t follow everything, because afterall they are not covered in the MLS, but since you are asking, these are examples that I can think of to share.
Cheers,
Raymond
Some googling keywords:
“stacking neural networks”, “autoencoder”, “RNN”, “Decision tree: gradient-boosted decision tree vs random forest”, “stacking ensemble”