Explanation of the formula for Information Gain in the decision nodes

Hi there,

I would like to ask the following.

When computing the reduction in entropy, so you can choose how to split, I understand that when you split the root node the formula is going to be:

1 - Weighted Average Entropy

We subtract by 1 because the entropy is originally 1 at the root node.

My question is what would the formula be when splitting the other decision nodes of the tree.

I understand how to compute the weighted average entropy but I am not sure if this should be subtracted by 1 as we did for the root node, and if yes what is the reason. I.e., is the entropy at the parent node always 1?


Perhaps your question is covered in the “Choosing a split: Information Gain” lecture? Or perhaps in the “Putting it together” lecture.

I believe the splitting process is identical at each node.