Compute information gain question

Hello, I was completing the compute_information_gain function in the Decision tree lab, and I was using the y values instead of the y_node values to compute the entropy, getting the same output as the expected one. However, the test was not passing until I checked and said to use y_node. Why do we need to use y_node instead of just y?

Hi @jleanezv,

I think you are talking about the practice lab exericse 3. In it, y is the label, and y_node = y[node_indices] which is a subset of the y. Therefore, when node_indices does not contain all indices of y, then y_node and y become different.

Since they can be different, we need to use the right one. And y_node is the right one, because it contains only labels that is concerned as defined by node_indices.