C2_W4_Decision_Tree_with_Markdown split_dataset error


I’m getting the split to work, and it gives the “Expected output”.
But my function fails on the assert length of left and right splits:

AssertionError Traceback (most recent call last)
—> 33 split_dataset_test(split_dataset)

~/work/public_tests.py in split_dataset_test(target)
44 assert type(right[0]) == int, f"Wrong type for elements in the right list. Expected: number got: {type(right[0])}"
—> 46 assert len(left) == 2, f"left must have 2 elements but got: {len(left)}"
47 assert len(right) == 3, f"right must have 3 elements but got: {len(right)}"

AssertionError: left must have 2 elements but got: 3

Help would be appreciated
Best regards

The Hint code for this function gives you most of the implementation.

The only bit you have to implement is an if-statement that does the part I’ve indicated with an arrow:
“check if the value of X at index [i] and [feature] == 1”.

Thanks for quick reply.
I saw the hint and the split works, see below figure. The actual splits are equal to “Expected Output” - red boxes in the below fig.

But the unit tests fail at row 46 where the unit test is “assert len(left) == 2”. But from the “Expected output” it seems that len(left) is 3…

Best regards

Sorry, I’m a little bit stuck on investigating this issue, because I’ve temporarily lost access to the MLS course notebooks.

Hopefully DLAI can fix it soon.

What’s happening here is confusion about how the tests are numbered.

There are two cases in the notebook, they’re labeled Case 1 and Case 2. Your code passes both of those. Those tests are run first.

Then the split_dataset_test() function has three other test cases. They’re numbered 1, 2, and 3.
Your code is failing its Case 1 test.
Here are the values for that test, maybe you can work this out by hand and see where your code malfunctions. The test case code is in the public_tests.py file.

I think this is the test that fails (printing X, node_indices, and feature):

Case 1 there has five examples, it’s testing feature 2 (the last column), and there are only two examples that have a 1 in that position.

It was a silly bug from my side of course. I used the global variable X_train in the function instead of the local variable X, which worked for many cases, but not for the test 3 test data.
Thanks a lot for the support!

Thanks for your report!