Explanation of the formula for Information Gain in the decision nodes

Hi there,

I would like to ask the following.

When computing the reduction in entropy, so you can choose how to split, I understand that when you split the root node the formula is going to be:

1 - Weighted Average Entropy

We subtract by 1 because the entropy is originally 1 at the root node.

My question is what would the formula be when splitting the other decision nodes of the tree.

I understand how to compute the weighted average entropy but I am not sure if this should be subtracted by 1 as we did for the root node, and if yes what is the reason. I.e., is the entropy at the parent node always 1?

Thanks.

Perhaps your question is covered in the “Choosing a split: Information Gain” lecture? Or perhaps in the “Putting it together” lecture.

I believe the splitting process is identical at each node.