CW2_W4_Decision_Tree_with_Markdown information_gain fails?

I would appreciate help with this lab! I have computed the information gain for all the splits correctly, but the unit test fails for some reason

UNIT TESTS

compute_information_gain_test(compute_information_gain)

When printing the variables to check what is going on, I get that

node indices = [0, 1, 2, 3]
w_left and w_right = 0.75 and 0.25
len(y_left) and len(y_right) = 3 and 1
y = [[0]
[1]
[0]
[1]
[0]]

y_left = [[0]
[1]
[0]]

y_right = [[1]]

shape of y and y_right (5, 1) (1, 1)
information gain = 0.2822287189138014

what I don’t understand is why does node indices have four elements, whereas y has five? Is this the reason it fails?

Here is the assertion error I get:

AssertionError Traceback (most recent call last)
in
9
10 # UNIT TESTS
—> 11 compute_information_gain_test(compute_information_gain)

~/work/public_tests.py in compute_information_gain_test(target)
110 node_indexes = list(range(4))
111 result = target(X, y, node_indexes, 0)
→ 112 assert np.isclose(result, 0.311278, atol=1e-6), f"Wrong information gain. Expected {0.311278} got: {result}"
113
114 result = target(X, y, node_indexes, 1)

AssertionError: Wrong information gain. Expected 0.311278 got: 0.2822287189138014

Hi @margrethe

The node_indices should match the length of y because they represent the indices of the samples in y. The fact that node_indices has four elements while y has five suggests that not all data points are being considered or indexed properly. This can result incorrect information gain calculation.

Double check that your code is correct by debugging it and printing intermediate values. Also, there might be an issue with how you’re calculating probabilities or entropy for the split.

Hope it helps! Feel to ask if you need further assistance.

Thanks, I figured it out in the end.