Why do we need to take weightage comparision in entropy function?

tbhaxor · February 5, 2023, 6:13pm

What Prof. Ng said is not clear to me

If there’s a node with a lot of examples in it with high entropy that seems worse than if there was a node with just a few examples in it with high entropy. Because entropy, as a measure of impurity, is worse if you have a very large and impure dataset compared to just a few examples

and a branch of the tree that is very impure.

actually I never get weightage comparison as metrics never

rmwkwok · February 6, 2023, 3:02am

Hello @tbhaxor,

This response relies 90% on intuition. It doesn’t fully explain the root cause mathematically, but if you want to go to the maths, “Maximum Likelihood Estimation” (MLE) is the keyword to start from.

Let’s consider this split where 10 out of 15 samples are “T” and the rest “F”.

We know how to calculate the entropy before splitting:

There are 3 points to observe:

I took the 15 outside
It’s already “weightage comparison”, or in my words, “weighted sum of log probability”
It can be read as: 10 samples have 10/15 chance be classifed as “T”, 5 have 5/15 as “F”

Agreed?

Now, after splitting:

I still took the 15 outside. This point doesn’t change.
It’s still “weighted sum of log probability”. This point doesn’t change.
It can be read as: 8 samples have 8/9 chance as “T”, 1 sample 1/9 as “F”, another 2 samples 2/6 as “T”, another 4 samples 4/6 as “F”.

If you make some small changes to the after-splitting formula, you get what we have learnt:

As my point number 2 said, it has always been some weighted sum.

Again, this response is 90% intuition, 10% MLE.

Cheers,
Raymond

rmwkwok · February 6, 2023, 3:35am

The above reply is for explaining how we calculate the entropy before and after split. “Weighted sum” is a consequence of it. Also, we have always been calculating Weighted sums, and it isn’t like suddenly showing up when we split.

tbhaxor · February 6, 2023, 9:09am

Here why are you taking natural log, not base 2. Also I often get stuck on deciding which base of log to take while calculation. How do you determine this?

tbhaxor · February 6, 2023, 9:12am

What I think is importance to that node should be given more which has a lot of splits because down the tree it will decide more splits on the basis of the features, and the node with less split should be given less importance in learning because it wont further split that much, that the previous node will do.

rmwkwok · February 6, 2023, 9:16am

It’s just my practice to use natural log. Please use base 2 instead to be consistent with the lecture.
The choice is not important, because they all should deliver the same decision tree, because changing from one base to another is only differed by a constant.

What is this explaining for?

Also, I think we can’t determine the importance of a node by the number of splits down it. Imagine I have a node under which there is only one split, but both left and right of the split is 100% pure and they contain 95% of all data. This node with only one split is VERY important. Agree?

tbhaxor · February 6, 2023, 9:56am

Ohk tell me this why we need to give more importance to the left node than right node (based on your example of T/F). This might help clear me doubts

rmwkwok · February 6, 2023, 11:22am

Left leaf is 9/15. It is 9 because it has 9 samples in total (8T + 1F). It is 15 because the total number of samples involved in this split is 15.

Right leaf is 6/15. It is 6 because it has 6 samples in total (2T + 4F). It is 15 because the total number of samples involved in this split is 15.

Very boring meaning of counting it is.

Topic		Replies	Views
Choosing a split via entropy calculations Advanced Learning Algorithms week-4	1	397	June 30, 2023
Is it possible that entropy could increase from father node to children nodes in decision tree algorithm Advanced Learning Algorithms week-3	3	542	September 15, 2022
Explanation of the formula for Information Gain in the decision nodes Advanced Learning Algorithms week-4	1	287	January 10, 2024
Cross-entropy Supervised ML: Regression and Classification week-3	2	37	October 23, 2024
Measuring purity vid Advanced Learning Algorithms week-4	2	364	August 14, 2023

Why do we need to take weightage comparision in entropy function?

Related topics