Exercise 2 split_dataset()

Carl_Aylen · July 21, 2022, 9:01pm

My code below always gets the right answer but, after confirming my answers are correct, the routine used by the course then throws a series of errors. Of course, I don’t have access to the course checking code so I can’t trouble shoot Here is my code:

def split_dataset(X, node_indices, feature):
  
    Splits the data at the given node into
    left and right branches
    
    Args:
        X (ndarray):             Data matrix of shape(n_samples, n_features)
        node_indices (list):  List containing the active indices. I.e, the samples being considered at this step.
        feature (int):           Index of feature to split on
    
    Returns:
        left_indices (list): Indices with feature value == 1
        right_indices (list): Indices with feature value == 0
    """
    
    # You need to return the following variables correctly
    left_indices = []
    right_indices = []
    
    ### START CODE HERE ###
   # {mentor edit: code removed}
    ### END CODE HERE ###
        
    return left_indices, right_indices

SamReiswig · July 21, 2022, 9:59pm

Hi!

split_dataset() is going to be used at every node in the decision tree to split the remaining data in that split.

Let’s say we have a data matrix with [1-5] indices and we split it at the root node into left [1, 3, 5] and right [2, 4]. Then let’s take the left node [1, 3, 5] and split again, then right node and split ect… until we’ve satisfied our split conditions.

Currently this code will work for the root node and only the root node because it is always using the entire data matrix to make the split instead of the current active indices for this split.

The parameter node_indices contains the list of current active indices to split.

Hope this helps

TMosh · July 22, 2022, 3:43am

Edited to remove the code from the OP, as we prefer students not share their code on the forum.

Carl_Aylen · July 22, 2022, 3:25pm

Hi Sam,

Well, I’m totally stuck and can go no further.

I modified my code to use node_indices and:

With node_indices =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and feature = 0, I get the correct answer:

Left indices:  [0, 1, 2, 3, 4, 7, 9]

Right indices: [5, 6, 8]

If I then set node_indices = = [0, 1, 2, 3, 4] and feature = 2, which is the same as your test routine, I get:

Left indices:  [0, 1, 4]

Right indices:  [2, 3]

Which looks right to me.

As far as I can see, the code yields the right result regardless of the values in node_indices.

Isn’t that what I’m expected to do?

I really need some help here – am I not understanding what you want?

Kind regards,

Carl

Dr. Carl Aylen, BSc, PhD, MA (Cantab)

CarlAylen@TalentChaser.com

Carl_Aylen · July 22, 2022, 3:56pm

Hi, so if I can’t share my code here, how do I get help with a lab problem?

SamReiswig · July 22, 2022, 4:08pm

Hi,

Can you send me a direct message with the code?

Also, try splitting on the left or right one more time.
I suspect that things are working for the root split, but failing when it tries to split further in the tree, using Left or Right indices as the input to node_indices.

TMosh · July 22, 2022, 4:35pm

Generally, if you post a description of the issue, along with a screen capture image showing any error messages or the assert stack, a mentor will be able to diagnose the issue.

In difficult cases, a mentor may ask you to send your code via a PM.

Carl_Aylen · July 22, 2022, 4:48pm

Hi,

Here is my code:

def split_dataset(X, node_indices, feature):

“”"

left_indices =

right_indices =

START CODE HERE

for i in range(len(node_indices)):

if X[node_indices[i],feature] == 1:

left_indices.append(i)

else:

right_indices.append(i)

END CODE HERE

return left_indices, right_indices

Kind regards,

Carl

Dr. Carl Aylen, BSc, PhD, MA (Cantab)

CarlAylen@TalentChaser.com

Carl_Aylen · July 22, 2022, 4:49pm

Hi

Thanks for the explanation – sorry for putting up the code.

Kind regards,

Carl

Dr. Carl Aylen, BSc, PhD, MA (Cantab)

CarlAylen@TalentChaser.com

Topic		Replies	Views
c2_w4_Practice lab_Ex2 split_dataset() why cannot pass test Advanced Learning Algorithms week-4	3	541	August 20, 2022
C2_W4_Decision_Tree_Ex2 and Ex3 Advanced Learning Algorithms week-4	13	208	May 9, 2024
MLS 2 week 4, las lab confusion Advanced Learning Algorithms week-4	4	523	August 10, 2022
Def split_dataset(X, node_indices, feature): Advanced Learning Algorithms week-4	6	519	February 27, 2023
Course 2 Programming Assignment: Decision Trees-Exercise 4 Advanced Learning Algorithms week-4	4	391	August 8, 2023

Exercise 2 split_dataset()

START CODE HERE

END CODE HERE

Related topics