Confusion about the concepts of entropy and information gain in Decision Tree!

Kbruno635 · September 2, 2024, 11:01am

Course-2/Week-4/Decision Tree

Unable to understand why -plogp-(1-p)log(1-p) is being taken as entropy. What really is entropy btw?

Thanks in advance!!

nadtriana · September 2, 2024, 12:56pm

Entropy, in the context of machine learning and decision trees, is a measure of uncertainty or disorder in a dataset. The concept comes from information theory, where entropy is used to quantify the amount of unpredictability or randomness in a system. The formula H(p) = -p \text{log}(p) - (1- p) \text{log}(1- p) is the entropy for a binary classification problem, where p is the probability of one class (say, the positive class) and 1-p is the probability of the other class (the negative class). Log is defined here in base 2 (binary logarithm), which is Shannon’s and conventional.

There is a high entropy (close to 1) when the classes are evenly distributed (e.g., p = 0.5); the entropy is high because there is maximum uncertainty about which class a randomly chosen sample will belong to. Note that the H reaches its higher value when p = 0.5. This means that the probability of the event is 0.5. There is a low entropy (close to 0) when one class dominates (e.g., p = 0.99 or p = 0.001), and its minimum value is reached at p = 0 and p = 1, i.e., the probability of the event is completely predictable. The entropy is low because there’s little uncertainty; most of the data belongs to one class, so there’s less “disorder”.

When building a decision tree, we want to split the data in a way that maximizes information gain, which is the reduction in entropy after a split. Information gain helps the decision tree determine which features best separate the classes, reducing uncertainty and improving classification accuracy. Entropy measures the amount of uncertainty in your data using the formula that captures that uncertainty for a binary classification problem. Information gain is then used to find splits in the decision tree that reduce that uncertainty as much as possible.

Topic		Replies	Views
Entropy notation is confusing Advanced Learning Algorithms week-module-4	7	439	August 28, 2023
Explanation of the formula for Information Gain in the decision nodes Advanced Learning Algorithms week-module-4	1	293	January 10, 2024
Information Gain calculation query Advanced Learning Algorithms week-module-1	2	450	June 6, 2023
Decision trees entropy and logistic regression loss Advanced Learning Algorithms week-module-4	3	516	July 14, 2023
Is it possible that entropy could increase from father node to children nodes in decision tree algorithm Advanced Learning Algorithms week-module-3	3	567	September 15, 2022

Confusion about the concepts of entropy and information gain in Decision Tree!

Related topics