Quiz question refers to random forest but answers don't include the main idea

The question in a quiz is

  1. For the random forest, how do you build each individual tree so that they are not all identical to each other?

The choices for answers seem to be about “bagging” and not “random forest”. I just read Random forest - Wikipedia which, like Andrew’s lecture, refers to selecting subsets of features but none of the choices for answers mentions this.

Am I confused or is the question misphrased?

Thanks @toontalk for the question! Both random selections of features and samples at building a new tree are incorporated in nowadays’ famous random forest packages.

sklearn implementation: see parameter max_features and max_samples.

xgboost parameters: see parameters subsample, colsample_bytree , colsample_bylevel , colsample_bynode.

They are now both part of the random forest algorithm. In the quiz’s choices, we can say that they only cover the random sampling of data, and not the random sampling of features.

Regarding the wiki link, I think the following sentences also imply the same idea that they are both part of the random forest.

The above procedure describes the original bagging algorithm for trees. Random forests also include another type of bagging scheme

Hi. Thanks. But isn’t it the case that bagging (or if I remember right Andrew referred to them as ensembles) has one way of generating diversity while random forests adds a second one (that is not mentioned in the quiz)? So the correct quiz answer is only part of the answer and not what distinguishes random forests from ensembles. This would have been much clearer if the question was about ensembles (or bagging) instead of random forests.

To me the quiz is like asking “How do electric cars move?” and the only good answer choice is “by turning the wheels”. Not incorrect but confusing by mentioning electric cars instead of just cars. One would expect the answer to be something true of electric cars but not all cars in general.

Somehow I think that I want to point out bagging and random forest are falling into 2 different categories. I am borrowing the name “machine learning ensemble meta-algorithm” from this wiki page.

So bagging does not stand alone as a ML algorithm but rather a technique that can be applied to ML algorithm.

For example, we can (though never seen) create a ensemble of linear regression models incoporating bagging techniques, such that when training a linear regression model, we only use a subset of features and samples, and at the end, the final prediction is produced by taking the average of outcomes from all the linear regression models in the ensemble.

Random forest is an ensemble method, it is an ensemble of decision trees. If you replace all “linear regression model” with “decision tree” in my above example, it will be how random forest works.

Right. Ensembles is the general category and I should have just discussed bagging and random forests only to avoid confusion. So I went to the transcript when Andrew talks about this. He says

This specific instance creation of tree ensemble is sometimes also called a bagged decision tree.
And that refers to putting your training examples in that virtual bag.

There’s one modification to this album that will actually make it work even much better and that changes this algorithm the bag decision tree into the random forest algorithm.

Since that modification has nothing to do with the quiz question, I still feel the question should have been

  1. For an ensemble of bagged decision trees how do you build each individual tree so that they are not all identical to each other?

OK. I see your point.

Decision tree: no ensemble technique
Bagged decision tree: sample bagging
Random forest: sample bagging + feature bagging

Thank you for clarifying.

Good. Do you know if the authors of the quizzes and labs see these discussions and take them into account when considering whether to edit them?

1 Like

I will bring that up to them.

I just sent a message about our discussion to the team, and shared the link of this thread to them as well. Thank you very much for your perspective! :slight_smile:

Great, thanks. Perhaps also send the thread about argmax on softmax and the lab error messages.

Sure. I planned to write about the error message later because I found another problem while debugging your code.

As for argmax, which discussion are we talking about? Would you mind sharing the link?

A minor point but may as well fix this: https://community.deeplearning.ai/t/why-call-argmax-on-softmax/143280/3

I see.

Sorry I did not address your question well in that reply, but I think @paulinpaloalto did - he is really a super mentor. :slight_smile:

Yes, though they produce the same results, it is not a problem or a bug that needs to be fixed in my opinion. I think keeping the softmax function there is a good way to emphasize (especially to some first-time learners) that the softmax is not included in the NN, and retain the logic of getting the classes with highest probabilities.

It is also a style of mine to get rid of unnecessary computation, but I hope to have your understanding on my opinion.

Thank you!

Good point. I would have preferred earlier in the lab that softmax of predict was used and the results displayed and discussed. Or at least if they leave it as you suggest, add a comment about how the softmax isn’t really needed.

1 Like

I would disagree at least a little with your reasoning here. I agree that it would be good for them to have shown the ex post facto application of softmax earlier in the notebook as a way to explain how things work in from_logits = True mode and to explain why things are done that way. That’s actually an important practical point that it would be good to discuss. Mind you, I have not watched the MLS lectures or done any of the MLS assignments yet, so maybe they did discuss this somewhere.

But I would disagree about the value of making the point about the monotonicity of softmax and that using argmax on the logits gives the same results. That is a relatively subtle mathematical point and the whole point of MLS is that it is even less “math oriented” than DLS. Sure it’s a tiny bit more code, but the application of the output activation in “predict” mode is a pretty minor cost in the grand scheme of things.

Thank you @toontalk and @paulinpaloalto!

@toontalk, although the content of this specialization is less maths oriented, I think learners can continue to benefit from your posts, at least I personally found simplifying code in my project quite satisfying and helped building up my understanding.

Thank you again!

how the course improve your work