MLS 2 week 4, las lab confusion

Hi I was wondering if somebody could help me out understanding the logic of this part of the lab.

def split_dataset(X, node_indices, feature):
    Splits the data at the given node into
    left and right branches
        X (ndarray):             Data matrix of shape(n_samples, n_features)
        node_indices (list):  List containing the active indices. I.e, the samples being considered at this step.
        feature (int):           Index of feature to split on
        left_indices (list): Indices with feature value == 1
        right_indices (list): Indices with feature value == 0
    # You need to return the following variables correctly
    left_indices = []
    right_indices = []
    ### START CODE HERE ###
    for i in node_indices:   
       ***if X[i][feature] == 1:***

I dont quite understand this part " if X[i][feature] == 1:"
I get I am selectin features from the X array but I dont understand why I have to write both [i] and [feature].
I thought [i] would go all over the features in the array. I know I am missing something but I cant put my finger on it.

Thank you for helping this young fool

1 Like

Hello @Felix_De_Balanzo_Oli! Thank you for the question! I find it critical to understand the shape of my dataset. I always have to run this check to my input datasets and my output, transformed datasets so that I know I am doing my work in a correct way, especially when it comes to timeseries data that my transformation can become 3, 4 or even 5 dimensional.

As the docstring says:

X (ndarray):             Data matrix of shape(n_samples, n_features)

X is a 2D array where the 0th dimension iterates over samples and the 1st dimension over features, so, X[i][feature] actually takes out the i-th sample and feature-th feature from X and check if the number is 1 or not.

There are 2 things you can do whenever you have doubt about the shape or the content of an array X:

  1. print(X.shape) - this will tell you the shape. If it has 3 samples and 2 features, the shape should be (3, 2).
  2. print(X) - this will should you the content, and for a X that has the shape of (3, 2), it should look like this [[a, b], [c, d], [e, f]].

Usually (1) is more useful because X can be just to big to be printed by (2).


Got it! Thank you Raymond!

1 Like

You are welcome Felix!