I am having a problem with the last test for “get_best_split” for Lab week 4: C2_W4_Decision_Tree_with_Markdown. I got the “2” right. But you are running some other test that I am failing. I tried to create a “pure” sample (all 0 or all 1 ) to debug. I am not making any progress. Can you please give me some insight into the test case that I am failing?
If I can reproduce the test case, I can fix the problem.
The best feature to split on: 2
AssertionError Traceback (most recent call last) in
*** 3 ***
*** 4 # UNIT TESTS*** ----> 5 get_best_split_test(get_best_split)
~/work/public_tests.py in get_best_split_test(target)
*** 129 result = target(X, y, node_indexes)***
*** 130 *** → 131 assert result == -1, f"When the target variable is pure, there is no best split to do. Expected -1, got {result}"
*** 132 ***
*** 133 y = X[:,0]***
AssertionError: When the target variable is pure, there is no best split to do. Expected -1, got 1
The above part of the traceback told us that the get_best_split_test(...) came from a file called public_tests.py which can be found by clicking “File” > “Open” on the menu bar of the jupyter notebook opened on Cousera.
This told us the line number for positioning the code that triggered the error. If the line number is not convenient, we might also just copy some part of the code and search it for where it is. After you found the triggering line, you should be able to locate the test case that was complaining.
Hi, I have a conceptual understanding of the nature of the error.
Unfortunately, this is not something the professor taught.
The error says, “If the target is fully correlated with other feature, that feature must be the best split”
What is happening the test script is taking a column of “X” as “y”. Hence the correlation comment.
There is nothing in the class on how to treat the correlation between target y and features X.
This is uncharted territory for me. I can force my code to give the right results, but I still wouldn’t understand what is going on. I don’t have a conceptual framework.
By painstakingly looking at the test cases, running them, and putting print statements everywhere, I was able to solve the issues mentioned above. I am willing to teach the approach to anyone stuck.
Basic Solution
You have two variables X and y. Specific test cases are creating y instances which are equal to columns of X or are linear combinations of a column of X. (Hence the text from the error about correlated data). You have to find as a pre-processing if any columns of X are correlated to y (see Pearson correlation). If a noticeable correlation exists {1 or -1}, return the column’s index in question (you are exiting the function call).
If there is no strong correlation, proceed as usual, using what you’ve learned.
Maybe you’re right. It’s not that complex, actually.
My code is written from the hints. Thus my code is very small and simple in general.
I just added an extra loop of ~7 lines to check for correlation. If I find a correlation, I do exactly what the error prescribes. I always listen to the code.
Anyways the proof of the pudding is in the eating. Also, there are thousands of roads that lead to Rome.
I just wanted to reply to your earlier question about correlation here.
The key thing is, even though we have used “correlation” to describe the situation, we didn’t mean to create a new way for determining which feature to split on. In all time, as Tom said, we stick to the approach taught in the lecture - computing information gain.
All roads lead to Rome. There are many ways to verbally describe that test case including calling it “perfectly correlated”. Since such description isn’t wrong, it’s going to be fine to implement your code as such, but I just wanted you to know that we don’t compute the correlation coefficient for checking the special case of whether it is perfectly correlated, instead, computing the information gain itself is sufficient to discover such special case because in the case of a perfect correlation, the related feature has the maximum gain.
In computer, we prefer the simplest road. If only computing information gain is enough, we then don’t compute any other stuff.
It would be great if you can check your code once again to see if it can pass all test cases without computing the correlation coefficients. I would be happy to help you through it if needed. Moreover, I appreciate your willingness to share and to try out different approaches. I believe it would already be an useful experience.
Forgive my brain, it is old and maybe rotten. Of course, I would like the simple road, but I do see connections everywhere. So when the test case told me, “If the target is fully correlated with other features…” I took it at its word, started looking for correlations, and singled them out.
Just out of curiosity, how in the world did this logic happen to work? this is such a fluke. The testers were not really thinking about Pearson correlations.
My code is straightforward. I am more than glad to share it. I follow the hints as if I were a compiler/interpreter. The only thing that made the code look ugly was all the print statements I had put all over.
How do you want me to post it? After all, it is an exam. You probably have access to it on the testing server, or I can post it here. Let me know.
I like elegance too. But I have to see it to appreciate it.
@ajeancharles:
If a mentor needs to see your code, we’ll contact you via a private message with instructions.
We do not have access to the grading server.
After reading your replies, I know how much the use of the word “correlation” by the test’s designer has affected your approach to code the solution. Perhaps you have already got what I wanted to say, but just to reiterate,
there is no need to compute correlation coefficient even though it is valid, and
without the coefficient, and with only computing the information gain, it is possible to pass all the tests.
To get started, we could look at the following piece of code which is actually provided to you in the assignment as hint (in a collapsed code block underneath the exercise’s code cell):
def get_best_split(X, y, node_indices):
# Some useful variables
num_features = X.shape[1]
# You need to return the following variables correctly
best_feature = -1
### START CODE HERE ###
max_info_gain = 0
# Iterate through all features
for feature in range(num_features):
# Your code here to compute the information gain from splitting on this feature
info_gain =
# If the information gain is larger than the max seen so far
if info_gain > max_info_gain:
# Your code here to set the max_info_gain and best_feature
max_info_gain =
best_feature =
### END CODE HERE ##
return best_feature
If we examine closely the above skeleton, we will find that, for each feature, we compute the gain, and then through the if statements, we keep updating the best gain value and the associated best feature.
We know that for a feature that is perfectly correlated, the gain has to be the maximal one (can you figure out Why?), therefore, once that perfect feature is being considered, it will satisfy the condition set by the if block, remembered by the variable best_feature, and set the max_info_gain to a value that no other feature can exceed!
If you just compare what I have said with the code skeleton above, do you have any disagreement? If no, then perhaps you can start from the skeleton and see if you can complete it?
I filled up this template you showed above using the hints in the exercise and previous exercises.
All other tests were successful except for this “correlated” stuff. I looked over all my notes from the Professor’s video; I could not find anything about the correlation.
Then I decided to pay attention to the wording of the test case response.
“perfectly correlated” implies maximal amount of information, no entropy, no surprise. You can use regression theory to approximate a curve.
The problem must be somewhere else in my code; since we are using the same template.
Maybe when I am calculating the information gain or doing the splitting of the data.
for feature in range(num_features):
# Your code here to compute the information gain from splitting on this feature
info_gain = <Here I compute the information gain>
# If the information gain is larger than the max seen so far
if info_gain > max_info_gain:
max_info_gain = <I put the new info gain, using other hints>
best_feature = <I update the feature, using other hints>
This is basically a repeat of the temple you show before, with my own comments and lots of print statements.
Thanks @ajeancharles for the update. Then I think one natural thing to consider is whether your gains computed are correct. For you to check, I am sharing my results:
the feature with the best gain is chosen as the best feature.
Note the number 2, 3 and 4 are test cases that use perfectly correlated features.
I am sharing the three printing lines I have added, and maybe you can do the same and see if you spot anything different in the printed results?
def get_best_split(X, y, node_indices):
# Some useful variables
num_features = X.shape[1]
# You need to return the following variables correctly
best_feature = -1
### START CODE HERE ###
max_info_gain = 0
print('\nBegin to loop through the features') #ADDED, please remove this line before submission.
# Iterate through all features
for feature in range(num_features):
# Your code here to compute the information gain from splitting on this feature
info_gain =
print(feature, info_gain) #ADDED, please remove this line before submission.
# If the information gain is larger than the max seen so far
if info_gain > max_info_gain:
# Your code here to set the max_info_gain and best_feature
max_info_gain =
best_feature =
### END CODE HERE ##
print(f'Best feature to split on: {best_feature}') #ADDED, please remove this line before submission.
return best_feature
Please feel free to share your printed results for follow-up discussion.
By the way, @ajeancharles, once you are able to pass all public tests, and you are ready to submit your work, please remember to remove all the print command lines added by you as they are going to interfere with the autograder and cause your submission to fail.
I will do it over the weekend. I am doing this for fun; I have been taking classes for a few days (knowledge binging) and need to catch up with some sleep. I need to get some sleep.
You will hear from me in a few days.