Def split_dataset(X, node_indices, feature):

For exercise 2 this week, do X and node_indices necessarily need to be of the same length? I keep getting a ‘List index out of range’ error when I try to assign the node indices to the left and right nodes.

No, node_indices does not have to have the same length as X, but the elements in node_indices must be smaller than len(node_indices) since those elements are used to index elements of X, e.g. X[a_node_index].

Raymond

Why would you need the whole design matrix, X, when splitting on nodes that don’t contain all the training examples?

Aside from that, I keep getting a ‘list index out of range’ error which I think is coming from the test case that passes 3 as an argument for feature when X has as its columns 0,1,2. (line 68 of public_tests.py)

There is always trade-off in decision that we make. The decision to include the whole X both makes sure the integrity of data and avoid creating many copies. In other words, we always only need the minimum amount of memory to store it throughout. It is not that we can throw some part of the X away when it is not needed in one place because it may be needed in another place. Indexing is a very common practice.

We need the whole error traceback, but I suggest you to try to figure it out yourself. If it said “index out of range”, then it explained the trouble and where the trouble is. All the test cases have been verified to be passable if the code is implemented as instructed.

Raymond

Thanks for your replies Raymond. They have been helpful.

You are welcome, @David_Stulman!