C2_W4_Decision_Tree_with_Markdown. Error in get_best_split tests

Hi,

In C2_W4_Decision_Tree_with_Markdown, I get the following error. My code matches the hints exactly for this cell as well as all the previous cells.

Any hints on which cells I need to debug? (All tests until here have passed.)

Appreciate the help.


AssertionError Traceback (most recent call last)
in
3
4 # UNIT TESTS
----> 5 get_best_split_test(get_best_split)

~/work/public_tests.py in get_best_split_test(target)
129 result = target(X, y, node_indexes)
130
→ 131 assert result == -1, f"When the target variable is pure, there is no best split to do. Expected -1, got {result}"
132
133 y = X[:,0]

AssertionError: When the target variable is pure, there is no best split to do. Expected -1, got 1

=================================================

Here is my debug output so far:
i: 0
len(X_node): 10 len(X_left): 7 len(X_right): 3
w_left: 0.7 w_right: 0.3
H_p1_left: 0.9852281360342515 H_p1_right: 0.9182958340544896 H_p1_node: 1.0
information_gain: 0.034851554559677034
info_gain: 0.034851554559677034 , max: 0
i: 1
len(X_node): 10 len(X_left): 4 len(X_right): 6
w_left: 0.4 w_right: 0.6
H_p1_left: 0.8112781244591328 H_p1_right: 0.9182958340544896 H_p1_node: 1.0
information_gain: 0.12451124978365313
info_gain: 0.12451124978365313 , max: 0.034851554559677034
i: 2
len(X_node): 10 len(X_left): 5 len(X_right): 5
w_left: 0.5 w_right: 0.5
H_p1_left: 0.7219280948873623 H_p1_right: 0.7219280948873623 H_p1_node: 1.0
information_gain: 0.2780719051126377
info_gain: 0.2780719051126377 , max: 0.12451124978365313
best_feature: 2
Best feature to split on: 2
i: 0
len(X_node): 5 len(X_left): 5 len(X_right): 0
w_left: 1.0 w_right: 0.0
H_p1_left: 0.9709505944546686 H_p1_right: 0 H_p1_node: 0.9709505944546686
information_gain: 0.0
info_gain: 0.0 , max: 0
i: 1
len(X_node): 5 len(X_left): 2 len(X_right): 3
w_left: 0.4 w_right: 0.6
H_p1_left: 0 H_p1_right: 0.9182958340544896 H_p1_node: 0.9709505944546686
information_gain: 0.4199730940219749
info_gain: 0.4199730940219749 , max: 0
best_feature: 1

You can open the public_tests.py file (using the File->Open menu), and then look at the get_best_split_test() function. Then you can see for what condition your code does not give the correct results.

Hi Thanks for the response.

I looked into the .py file.

The target variable, y, is all zeros for the example my code fails. How do I define “pure” in my code? Is it node_entropy == 0 before we split?

If you look at the code that was given in the get_best_split() function, it started by initializing “best_feature = -1”, and then you add that max_info_gain is initialized to 0.

You should only modify the best_feature value if the current info_gain is greater than “max_info_gain”.

This means if the info gain is 0 for a specific test (i.e. the target variable is ‘pure’), then your code will return the initial best_feature value.

This is all fine. This is what I got in my code.

Here is my output for the last test where I fail:

info_gain:  0 , max:  0 , best_feature:  -1
i:  0 , num_features:  2
len(X_node):  5 len(X_left):  5 len(X_right):  0
H_p1_left:  0.9709505944546686 H_p1_right:  0 H_p1_node:  0.9709505944546686
w_left:  1.0 w_right:  0.0
information_gain:  0.0
info_gain:  0.0 , max:  0

i:  1 , num_features:  2
len(X_node):  5 len(X_left):  2 len(X_right):  3
H_p1_left:  0 H_p1_right:  0.9182958340544896 H_p1_node:  0.9709505944546686
w_left:  0.4 w_right:  0.6
information_gain:  0.4199730940219749
info_gain:  0.4199730940219749 , max:  0.4199730940219749
best_feature:  1

AssertionError: When the target variable is pure, there is no best split to do. Expected -1, got 1

Are you saying, I should stop splitting after I reach info_gain: 0.0 after the first feature?

I can share my code privately with you if that is okay.

Thanks,
Senthil.

For others who find this thread:

Inside the get_best_split() function:

  1. Do not use ‘max’ as a variable name. That’s a python keyword. Use the recommended variable name from the template code.

  2. Do not use global variables X_train and y_train. Use the variables ‘X’ and ‘y’ which were passed as function arguments.

1 Like