If there’s a node with a lot of examples in it with high entropy that seems worse than if there was a node with just a few examples in it with high entropy. Because entropy, as a measure of impurity, is worse if you have a very large and impure dataset compared to just a few examples
and a branch of the tree that is very impure.
actually I never get weightage comparison as metrics never
This response relies 90% on intuition. It doesn’t fully explain the root cause mathematically, but if you want to go to the maths, “Maximum Likelihood Estimation” (MLE) is the keyword to start from.
Let’s consider this split where 10 out of 15 samples are “T” and the rest “F”.
The above reply is for explaining how we calculate the entropy before and after split. “Weighted sum” is a consequence of it. Also, we have always been calculating Weighted sums, and it isn’t like suddenly showing up when we split.
Here why are you taking natural log, not base 2. Also I often get stuck on deciding which base of log to take while calculation. How do you determine this?
What I think is importance to that node should be given more which has a lot of splits because down the tree it will decide more splits on the basis of the features, and the node with less split should be given less importance in learning because it wont further split that much, that the previous node will do.
It’s just my practice to use natural log. Please use base 2 instead to be consistent with the lecture.
The choice is not important, because they all should deliver the same decision tree, because changing from one base to another is only differed by a constant.
What is this explaining for?
Also, I think we can’t determine the importance of a node by the number of splits down it. Imagine I have a node under which there is only one split, but both left and right of the split is 100% pure and they contain 95% of all data. This node with only one split is VERY important. Agree?