Need help in Decision Trees Labs

Re: Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera

I am getting error in unit tester of get_best_split() . I see two issues:

  1. The tester code is passing y as y = X[:,1], however when I print X and y in the def get_best_split(X, y, node_indices):
    , y is actually X[0]
  2. The Tester code gets -1, but I printed out the best_feature before returning and here is the output…
    bestgain= 0.9709505944546686
    feature= 0, best_feature=0

Somewhere there is a mistake in parameter passing between the tester and the method. I have put several debug statements, restarted the kernel several times.
I need help!

def get_best_split(X, y, node_indices):

Passed X =
[[1 0]
[1 0]
[1 0]
[0 0]
[0 1]]
Passed y = [1 1 1 0 0]

----> 5 get_best_split_test(get_best_split)

~/work/ in get_best_split_test(target)
136 y = X[:,1]
137 result = target(X, y, node_indexes)
→ 138 assert result == 1, f"If the target is fully correlated with other feature, that feature must be the best split. Expected 1, got {result}"
140 y = 1 - X[:,0]

AssertionError: If the target is fully correlated with other feature, that feature must be the best split. Expected 1, got -1

@Venkat_Subramani, I’m guessing the print results you’re seeing may be coming from the previous test case in get_best_split_test(). In any case, I suggest to simplify the debugging, you temporarily add a cell in the lab where you add just the one test case that is failing and debug it there. Something like this:

X = np.array([[1, 0], 
         [1, 0], 
         [1, 0], 
         [0, 0], 
         [0, 1]])
node_indexes = list(range(5))
y = X[:,1]
result = get_best_split(X, y, node_indexes)

Hi Wendy,
Thank you so much! That helped me narrow it down.
Also I found, even when the tests passed successfully and you submit, the Auto Grader does not get the latest version of the code.

I had another followup up, I wrote this way to look for purity in the target variable. Any other better ways?

if  ( sum(yy) ==0 ) | ( sum(y)/len(y) == 1):
    return -1

Thank You

1 Like

The grader always uses the notebook with the original name. If you change the name of a notebook, you cannot have it graded.

This is extremely unlikely.

@Venkat_Subramani, I’m assuming the yy in sum(yy) is just a typo and this is your code to check if y is either all 0’s or all 1’s. There are lots of ways you might do this. What you have is perfectly fine. One other options would be to use all like this:

if all(y) == y[0] :

Another choice would be to not do this check at all and just rely on the fact that compute_information_gain() will return 0 if a split doesn’t provide any benefit. If you only update best_feature if compute_information_gain() is greater than 0, then there’s no need for a separate check to see if the target is all 0’s or all 1’s. There are pros and cons to this approach. The code is arguably a bit cleaner, but not as efficient in the case where the target is all 0’s or 1’s.

1 Like

TMosh, The file name was same. I suspect it was not saved or passing previous version. I closed and submitted again after restarting and it worked.


Thanks Wendy

1 Like