I have a question related to the practice quiz. The question is “For a continuous valued feature (such as weight of the animal), there are 10 animals in the dataset. According to the lecture, what is the recommended way to find the best split for that feature?”

The correct answer is “Choose the 9 mid-points between the 10 examples as possible splits, and find the split that gives the highest information gain.”

I think this method maybe good for small dataset. If there are millions of different weights, would this method still be a recommended one? I may not fully understand this question. Can someone enlighten me.

Yes, I think you are correct, that approach won’t be practical for large datasets, in this case use another algorithm such as binary search to find the best possible split that maximize information gain can be better for this problem.

Alternatively, you could use a decision tree algorithm that uses a different approach to find the best split for each feature. These algorithms typically build the tree in a top-down, greedy manner, choosing the split that gives the highest information gain at each step. They can be more efficient than using binary search, especially for larger datasets, but they may also be more prone to overfitting.