Advanced learning algorithms -> Decision tree learning -> Continuous valued features

Can someone explain in another way what Prof. Ng is trying to explain here in his video lesson at 4:09?

“In the more general case, we’ll actually try not just three values, but multiple values along the X axis. And one convention would be to sort all of the examples according to the weight or according to the value of this feature and take all the values that are mid points between the sorted list of training. Examples as the values for consideration for this threshold over here. This way, if you have 10 training examples, you will test nine different possible values for this threshold and then try to pick the one that gives you the highest information gain.”

Thanks.

Hey @anon88576143 ,
Here is another explaination:
In the video the example was taken for weight = 8, 9 and 13 but in general case we want to check for more values so what we do conventionally is after arranging all the data we have of weight on x-axis we check for mid-point of each pair of values so let’s say there are n different values of data, it will give n-1 threshold values. Similar to what we did for 3 values we calculate information gain for all this n-1 values and select node with highest information gain to split for decision tree.
Hope it solves your confusion
Feel free to ask if you have still some query
Riya

Thanks I get it now :slight_smile: