MLS 2 week 4, las lab confusion

Felix_De_Balanzo_Oli · August 7, 2022, 12:07am

Hi I was wondering if somebody could help me out understanding the logic of this part of the lab.

def split_dataset(X, node_indices, feature):
    """
    Splits the data at the given node into
    left and right branches
    
    Args:
        X (ndarray):             Data matrix of shape(n_samples, n_features)
        node_indices (list):  List containing the active indices. I.e, the samples being considered at this step.
        feature (int):           Index of feature to split on
    
    Returns:
        left_indices (list): Indices with feature value == 1
        right_indices (list): Indices with feature value == 0
    """
    
    # You need to return the following variables correctly
    left_indices = []
    right_indices = []
    
    ### START CODE HERE ###
    for i in node_indices:   
       ***if X[i][feature] == 1:***
           left_indices.append(i)
       else:
           right_indices.append(i)

I dont quite understand this part " if X[i][feature] == 1:"
I get I am selectin features from the X array but I dont understand why I have to write both [i] and [feature].
I thought [i] would go all over the features in the array. I know I am missing something but I cant put my finger on it.

Thank you for helping this young fool

rmwkwok · August 7, 2022, 1:08am

Hello @Felix_De_Balanzo_Oli! Thank you for the question! I find it critical to understand the shape of my dataset. I always have to run this check to my input datasets and my output, transformed datasets so that I know I am doing my work in a correct way, especially when it comes to timeseries data that my transformation can become 3, 4 or even 5 dimensional.

As the docstring says:

X (ndarray):             Data matrix of shape(n_samples, n_features)

X is a 2D array where the 0th dimension iterates over samples and the 1st dimension over features, so, X[i][feature] actually takes out the i-th sample and feature-th feature from X and check if the number is 1 or not.

There are 2 things you can do whenever you have doubt about the shape or the content of an array X:

print(X.shape) - this will tell you the shape. If it has 3 samples and 2 features, the shape should be (3, 2).
print(X) - this will should you the content, and for a X that has the shape of (3, 2), it should look like this [[a, b], [c, d], [e, f]].

Usually (1) is more useful because X can be just to big to be printed by (2).

Raymond

Felix_De_Balanzo_Oli · August 8, 2022, 6:16pm

Got it! Thank you Raymond!

rmwkwok · August 8, 2022, 8:34pm

You are welcome Felix!

Topic		Replies	Views
c2_w4_Practice lab_Ex2 split_dataset() why cannot pass test Advanced Learning Algorithms week-module-4	3	541	August 20, 2022
Exercise 2 split_dataset() Advanced Learning Algorithms week-module-4	8	578	July 22, 2022
C2_W4_Decision_Tree_Ex2 and Ex3 Advanced Learning Algorithms week-module-4	13	208	May 9, 2024
Def split_dataset(X, node_indices, feature): Advanced Learning Algorithms week-module-4	6	519	February 27, 2023
C2_W4 practice lab- value error in excercise 2 Advanced Learning Algorithms week-module-4	2	489	December 8, 2022

MLS 2 week 4, las lab confusion

Related topics