Requesting help for C2_W4 Practice Lab Exercise 3

Hello,

I’ve been getting a weird result from the 3rd exercise. I get this despite having the correct output.

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-42-2a50df79e115> in <module>
      9 
     10 # UNIT TESTS
---> 11 compute_information_gain_test(compute_information_gain)

~/work/public_tests.py in compute_information_gain_test(target)
    104 
    105     result = target(X, y, node_indexes, 1)
--> 106     assert np.isclose(result, 0, atol=1e-6), f"Wrong information gain. Expected {0.0} got: {result}"
    107 
    108     print("\033[92m All tests passed.")

AssertionError: Wrong information gain. Expected 0.0 got: nan

I tried to use the hint code, but the exact same error appeared. Then, I tried to copy the code snippet from public_tests.py to reproduce the error.

X = np.array([[1, 0], 
         [1, 0], 
         [1, 0], 
         [0, 0], 
         [0, 1]])
    
y = np.array([[0, 0, 0, 0, 0]]).T
node_indexes = list(range(5))
result = compute_information_gain(X, y, node_indexes, 1)
print(result)
assert np.isclose(result, 0, atol=1e-6), f"Wrong information gain. Expected {0.0} got: {result}"

But this code snippet passed and didn’t print the assertion error. So I am not sure what’s actually going on now. Can someone help me diagnose the issue?
Thank you very much for your help.

Yours sincerely,
LUIY0004

Hi @luiy0004

Please make sure you follow the steps of the implementations and also you can follow these hint step

Thanks!
Abdelrahman

Hey @luiy0004,
This kind of behaviour is mostly seen when your implementation uses one or more global variables, which you are not supposed to use, unless stated otherwise. Please ensure that you haven’t done the same. Let us know if this helps.

Cheers,
Elemento

1 Like

Dear Abdelrahman,

I tried following the hints and got the exact same error.

LUIY0004

Dear Elemento,

I checked the hint variables against the other sections. The hint variables do not seem to be global variables. This is the hint code that produced the error:

# UNQ_C3
# GRADED FUNCTION: compute_information_gain

def compute_information_gain(X, y, node_indices, feature):
    
    """
    Compute the information of splitting the node on a given feature
    
    Args:
        X (ndarray):            Data matrix of shape(n_samples, n_features)
        y (array like):         list or ndarray with n_samples containing the target variable
        node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
   
    Returns:
        cost (float):        Cost computed
    
    """    
    # Split dataset
    left_indices, right_indices = split_dataset(X, node_indices, feature)
    
    # Some useful variables
    X_node, y_node = X[node_indices], y[node_indices]
    X_left, y_left = X[left_indices], y[left_indices]
    X_right, y_right = X[right_indices], y[right_indices]
    
    # You need to return the following variables correctly
    information_gain = 0
    
    ### START CODE HERE ###
    # Your code here to compute the entropy at the node using compute_entropy()
    node_entropy = compute_entropy(y_node)
    # Your code here to compute the entropy at the left branch
    left_entropy = compute_entropy(y_left)
    # Your code here to compute the entropy at the right branch
    right_entropy = compute_entropy(y_right)

    # Your code here to compute the proportion of examples at the left branch
    w_left = len(X_left) / len(X_node)

    # Your code here to compute the proportion of examples at the right branch
    w_right = len(X_right) / len(X_node)

    # Your code here to compute weighted entropy from the split using 
    # w_left, w_right, left_entropy and right_entropy
    weighted_entropy = w_left * left_entropy + w_right * right_entropy

    # Your code here to compute the information gain as the entropy at the node
    # minus the weighted entropy
    information_gain = node_entropy - weighted_entropy
    ### END CODE HERE ###  
    
    return information_gain

Is there something wrong with how I followed the hint code?

LUIY0004

Hi @luiy0004

If we look at the failed test we can see that the function is producing a nan.
What is nan and how is it produced?

One way is division by 0.
After nan has been produced any arithmetic with nan is nan.

I would check everywhere division is performed in compute_information_gain() and compute_entropy() for where division by 0 can occur and try to prevent it.

2 Likes

Dear SamReiswig,

Checking for where NaN could have been worked. Division by 0 was being tested in the compute entropy function. Strangely, that function passed all tests before I corrected it.
Thank you very much for your help.

Yours sincerely,
LUIY0004

I have same issue as luly004…I went to check compute_entropy(y)…then only ‘/’ happens at
p1 = len(y[y==1])/len(y), I added a condition on top: if len(y)>0…but still doesn’t work…please help…thanks!

Hi @Hui_Chen2 ,

I can’t tell what wrong from the description.
Can you send me your code in a direct message and I’ll take a look.

I am facing the same issue. Was it resolved?

Hi @SiddharthPandey ,

Can you send me your code in a direct message and I’ll take a look.

There’s another thread with the same issue. I fixed it with the hint (cover all the edge cases in previous function)

For anyone finding themselves here.
This happens when all the samples end up in one branch and none in the other, IF you have not accounted for there being no items in y in calculate_entropy().

The unit tests for calculate_entropy() should really test for that, but as of time of writing, don’t.

1 Like

thank you! your answer help me a lot. I think this is the exactly what is going on.

Hey @SandunMeesara,
Welcome, and we are glad that you could become a part of our community :partying_face: Thanks a lot for letting us know that your issue has been resolved.

Cheers,
Elemento