Hi @Christina_Fan,
Have you figured this out? If not, from the thread above, it looks like it’s time to step back and make sure you’ve got a clear picture of all the parameters into split_dataset() to help get you back on track.
Here are the comments in the code that explain the input parameters:
def split_dataset(X, node_indices, feature):
"""
Splits the data at the given node into
left and right branches
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
node_indices (list): List containing the active indices. I.e, the samples being considered at this step.
feature (int): Index of feature to split on
X is shape (n_samples, n_features). Think of it as rows of features - one row per sample.
In the assignment, the picture of the first few rows looks like this:
where the 0, 1, 2, 3, … are the indices for each row (sample), and the values in each row are the values for the features.
My guess about what may be throwing you off is the node_indices(list) parameter. This is just a list of the indices (row numbers) of the samples for the current node. This tells us which rows in X we want to look at in order to pick which will go into the left branch and which will go to the right branch.
So, you’ll use each value in node_indices to tell you which rows of X you want to look at, and then the “feature” parameter to tell you the index of which feature in that row you want to look at.
Hopefully, this explanation helps make things click, but you can also go back and re-read the explanations in the lab to make sure you understand what each parameter represents. You can also click on the green “Click for hints” link after the code cell to see what the code should look like. If you use the “Click for hints”, once you get it working, you can go back and make sure you understand why it’s working.