In Random Forest Algorithm "n_estimators" and choosing a subset of "m" training examples for training

Ammar_Jawed · June 8, 2023, 1:47pm

I have 2 important questions from this lab in module 4 “C2_W4_Lab_02_Tree_Ensemble”:

Selected in red box, we have a parameter n_estimators. From the description it sounds like the value of B from b = 1 to B in the lecture slides. Just confirming if n_estimators is the value that decides how many times the procedure of Sampling with Replacement will be performed?
Selected in the blue box, says that the Random Forest Algorithm chooses a subset of the training examples to train each individual tree. From the description it sounds like a subset of training examples “m” is used to train every individual tree “b”.

Just to confirm that in the lecture videos we learned that in a Random Forest Algorithm a subset of features “n” is used to train each individual tree, however this is something that we didn’t learn in the lecture videos but it’s mentioned in the lab that a Random Forest Algorithm, along with choosing a subset of features “n”, chooses also a subset of training examples “m” to train every individual tree. Am I following correctly?

rmwkwok · June 8, 2023, 2:46pm

Hi @Ammar_Jawed

Yes, n_estimators is equal to B.

Yes, as long as sampling of data is enabled, there will be one sampling happened before each tree is built. If B trees are built, the data will be sampled B times.

m is the total number of samples, right? If so, then we need to first ask ourselves how manys we are going to sample in building each tree. If we set it to m/2, then m/2 samples will be drawn from the full dataset and it will be used to train the first tree (b=1). Then another m/2 samples will be drawn to train the second tree (b=2), and so on and so forth.

In building a tree, we can choose to do none, one, or both of the following:

sample a subset of data
sample a subset of features

If we choose to do both, then on building the first tree (b=1), a subset of data will be sampled, and then a subset of features will be sampled. If the full dataset has m samples and n features, then after sampling, the full dataset can just have m/2 samples and n/2 features. The actual number of samples and actual number of features depend on your settings.

Sampleing is redone before building every tree.

There are actually even more ways of sampling for features if you dive into xgboost, and they are not covered by the lecture. Note that the lecture does not teach 100% of xgboost, but only has time to give us some core understanding.

Cheers,
Raymond

Ammar_Jawed · June 8, 2023, 2:56pm

Thanks for explaining that, my first question is answered. I just want to make one clarification. When I say “m”, I mean the total number of training examples.

For instance in this lab,

“m” for training data is 734
“m” for validation data is 184

Just like in Neural Networks, where we used “m” to determine the total number of training examples.

When you said “m is the total number of samples, right?”. Did you mean total number of training examples by total number of samples?

rmwkwok · June 8, 2023, 3:00pm

I see. I meant 734 - the total number of training samples.

Raymond

Ammar_Jawed · June 8, 2023, 3:28pm

Perfect… It makes sense now. Thanks for answering that in detail.
Just one last thing. The subset of samples taken from “m” is a random subset right, just like random subset of “n” features is taken?

rmwkwok · June 8, 2023, 3:30pm

Yes, they are random subsets.

Keep it up, and happy learning!

Raymond

Topic		Replies	Views
Quiz question refers to random forest but answers don't include the main idea Advanced Learning Algorithms week-module-4	16	907	June 23, 2022
Randomizing the features choice Advanced Learning Algorithms week-module-4	3	496	July 17, 2022
Boosted Trees Iteration B or B-1? Advanced Learning Algorithms week-module-4	3	424	June 7, 2023
RF Sampling with replacement duplicate rows Advanced Learning Algorithms week-module-4	2	487	July 29, 2022
How to select feature to build random forest? Advanced Learning Algorithms week-module-4	1	384	August 2, 2023

In Random Forest Algorithm "n_estimators" and choosing a subset of "m" training examples for training

Related topics