Is it possible that for some feature ‘x’, the split causes the Child Node to have a higher Entropy than the Parent Node, which means that we have a negative value for the Information Gain?
Hi @Ammar_Jawed,
I had never thought of this, and with a 15-minute thinking, I couldn’t think of any trivial cases that we can have a negative information gain (IG). At least, we could not have a split such that both children’s entropy are larger than the parent’s.
I know this is incomplete, so you probably would need to do more research on the internet for any more formal or definitive proof, or you would need to find one example of negative information gain (if it exists which I don’t believe so, but feel free to show me I am wrong).
Cheers,
Raymond

