Hello, I was completing the compute_information_gain function in the Decision tree lab, and I was using the y values instead of the y_node values to compute the entropy, getting the same output as the expected one. However, the test was not passing until I checked and said to use y_node. Why do we need to use y_node instead of just y?
Hi @jleanezv,
I think you are talking about the practice lab exericse 3. In it, y
is the label, and y_node = y[node_indices]
which is a subset of the y
. Therefore, when node_indices
does not contain all indices of y
, then y_node
and y
become different.
Since they can be different, we need to use the right one. And y_node
is the right one, because it contains only labels that is concerned as defined by node_indices
.
Cheers!
Raymond