Exercise 2 split_dataset()

My code below always gets the right answer but, after confirming my answers are correct, the routine used by the course then throws a series of errors. Of course, I don’t have access to the course checking code so I can’t trouble shoot Here is my code:

def split_dataset(X, node_indices, feature):
  
    Splits the data at the given node into
    left and right branches
    
    Args:
        X (ndarray):             Data matrix of shape(n_samples, n_features)
        node_indices (list):  List containing the active indices. I.e, the samples being considered at this step.
        feature (int):           Index of feature to split on
    
    Returns:
        left_indices (list): Indices with feature value == 1
        right_indices (list): Indices with feature value == 0
    """
    
    # You need to return the following variables correctly
    left_indices = []
    right_indices = []
    
    ### START CODE HERE ###
   # {mentor edit: code removed}
    ### END CODE HERE ###
        
    return left_indices, right_indices

Hi!

split_dataset() is going to be used at every node in the decision tree to split the remaining data in that split.

Let’s say we have a data matrix with [1-5] indices and we split it at the root node into left [1, 3, 5] and right [2, 4]. Then let’s take the left node [1, 3, 5] and split again, then right node and split ect… until we’ve satisfied our split conditions.

Currently this code will work for the root node and only the root node because it is always using the entire data matrix to make the split instead of the current active indices for this split.

The parameter node_indices contains the list of current active indices to split.

Hope this helps

Edited to remove the code from the OP, as we prefer students not share their code on the forum.

Hi Sam,

Well, I’m totally stuck and can go no further.

I modified my code to use node_indices and:

With node_indices =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and feature = 0, I get the correct answer:

Left indices:  [0, 1, 2, 3, 4, 7, 9]

Right indices: [5, 6, 8]

If I then set node_indices = = [0, 1, 2, 3, 4] and feature = 2, which is the same as your test routine, I get:

Left indices:  [0, 1, 4]
Right indices:  [2, 3]

Which looks right to me.

As far as I can see, the code yields the right result regardless of the values in node_indices.

Isn’t that what I’m expected to do?

I really need some help here – am I not understanding what you want?

Kind regards,

Carl

Dr. Carl Aylen, BSc, PhD, MA (Cantab)

CarlAylen@TalentChaser.com

Hi, so if I can’t share my code here, how do I get help with a lab problem?

Hi,

Can you send me a direct message with the code?

Also, try splitting on the left or right one more time.
I suspect that things are working for the root split, but failing when it tries to split further in the tree, using Left or Right indices as the input to node_indices.

Generally, if you post a description of the issue, along with a screen capture image showing any error messages or the assert stack, a mentor will be able to diagnose the issue.

In difficult cases, a mentor may ask you to send your code via a PM.

Hi,

Here is my code:

def split_dataset(X, node_indices, feature):

“”"

left_indices =

right_indices =

START CODE HERE

for i in range(len(node_indices)):

if X[node_indices[i],feature] == 1:

left_indices.append(i)

else:

right_indices.append(i)

END CODE HERE

return left_indices, right_indices

Kind regards,

Carl

Dr. Carl Aylen, BSc, PhD, MA (Cantab)

CarlAylen@TalentChaser.com

Hi

Thanks for the explanation – sorry for putting up the code.

Kind regards,

Carl

Dr. Carl Aylen, BSc, PhD, MA (Cantab)

CarlAylen@TalentChaser.com