Hello, I was completing the compute_information_gain function in the Decision tree lab, and I was using the y values instead of the y_node values to compute the entropy, getting the same output as the expected one. However, the test was not passing until I checked and said to use y_node. Why do we need to use y_node instead of just y?

Hi @jleanezv,

I think you are talking about the practice lab exericse 3. In it, `y`

is the label, and `y_node = y[node_indices]`

which is a subset of the `y`

. Therefore, when `node_indices`

does not contain all indices of `y`

, then `y_node`

and `y`

become different.

Since they can be different, we need to use the right one. And `y_node`

is the right one, because it contains only labels that is concerned as defined by `node_indices`

.

Cheers!

Raymond