Randomizing the features choice

Thala · July 17, 2022, 3:51am

I couldn’t understand this slide.
An elaborate explanation with some examples will be helpful!

Elemento · July 17, 2022, 4:33am

Hey @Thala,
I am assuming we are clear on the fact that why are we doing it, still let me put it out here, since that will help you understand this more easily.

In the video lecture entitled “Random forest algorithm”, till this slide, Prof Andrew has discussed about Bagged Decision Trees, and just before this slide he mentioned that even after Sampling with Replacement of training examples, we often end up performing the same split at different nodes in different decision trees. So, we introduce another level of randomization, which is randomizing the feature choice, which helps us to make our decision trees even more different from each other and the algorithm that uses it is known as random forest algorithm.

Now, once we are clear with “Why we use this strategy?”, let’s get to “How?”, which is pretty straight-forward. At each node, when we perform a split, we typically considered all the features till now, but under this strategy, we will choose only a subset of K features from all the features, and we perform the split based on them only, where a typical choice of K = \sqrt{n}. Now, I have stated pretty much what is written on the slide, but let me present an example, to make it more clear.

I guess the below diagram will make it clear to you. In the below diagram, you will find that I haven’t always selected 4 = \sqrt{16} features at each node, because it’s just a typical value, and theoretically, you can select a subset of any number of features, like I have done. But practically, using K = \sqrt{n} gives a good performance. I hope this helps.

Untitled Diagram.drawio (1)

Regards,
Elemento

Thala · July 17, 2022, 7:39am

Thanks!! That really helped!

Professor has taught about using decision trees for regression problems in the optional lecture .

Can Random forest also be used for a regression problem ?
If yes, will it be a good choice over decision trees?

Elemento · July 17, 2022, 8:54am

Hey @Thala,

Indeed, they can be used for regression. You can find the sklearn’s implementation of Random Forest Regressor here. It’s pretty much same as the Random Forest Classifier, with the only difference of, instead of taking a majority vote of the predicted class labels from the different decision trees, it returns the mean of the predicted values from the different decision trees.

Random forest has pretty much the same pros and cons against the Decision trees, irrespective of the task, be it classification or regression. I hope this helps.

Regards,
Elemento

Topic		Replies	Views
XGBoost : random feature choice? Advanced Learning Algorithms week-4	0	263	February 5, 2024
Quiz question refers to random forest but answers don't include the main idea Advanced Learning Algorithms week-4	16	889	June 23, 2022
How to select feature to build random forest? Advanced Learning Algorithms week-4	1	384	August 2, 2023
Boosting algorithms Advanced Learning Algorithms week-4	2	390	July 28, 2023
Criteria for splitting : Advanced Learning Algorithms week-4	3	199	May 9, 2024

Randomizing the features choice

Related topics