Can't get past UNC_C2: train_naive_Bayes

I’m clearly missing something in this function. The unit test gives:

Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
 12  Tests passed
 3  Tests failed

Looking back at previous threads and suggestions by paulinpaloalto for the word ‘smile’, I tested my outputs and get:

N_pos =  27547
N_neg =  27152
freqs_pos =  47
freqs_neg =  9
loglikelihood =  1.5584316115593073


where the freqs and loglikelihood are for ‘smile’. All of these are correct except for the loglikelihood which is off by a small margin.

I don’t understand how the loglikelihood can be wrong when the freq_pos, freq_neg, N_pos, N_neg, and V are correct and I’m using the Laplacian smoothing as instructed.

Any suggestions would be appreciated. Thanks!

I’ll add that while I can calculate V_pos, and V_neg (5804 and 5632 respectively as in prior posts) I don’t use them for anything. Are they supposed to be used? If so, can you point me to where?

Here are my outputs from the cell that tests train_naive_bayes:

V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676

Notice that everything is the same except the actual loglikelihood value. A difference in the 3rd decimal place is not a rounding error: we’re doing 64 bit floating point here, so rounding errors are 10^{-16} or smaller.

I would start by checking your “order of operations” in that computation and comparing your code carefully to the mathematical formula that we are trying to implement. If the inputs are all correct, then there must be something wrong with that implementation.

Also you’re right that the V_pos and V_neg are not really used for anything.

1 Like

I was having the same problem until I reread other posts on this subject by @paulinpaloalto. I’m going to restate Paul’s comment more directly because I was too dense to pickup on it at first. Check that you are incrementing the N_pos and N_neg by the value in the dictionary, not just incrementing it by 1. For example, originally I used:

N_pos +=1

whereas, it should actually be:

N_pos += freqs.get(pair, 0)

Cheers, and thanks to Paul for sharing his knowledge.



I’m having the same problem as @benballintyn (12 Tests passed, 3 Tests failed). But my ‘smile’ test code gives me the same numbers that @paulinpaloalto gets.

N_neg = 27152
N_pos = 27547
D = 8000
D_pos = 4000.0
logprior = 0.0
freq_pos, freq_neg, loglikelihood[‘smile’] = 47 9 1.5577981920239676

Not sure where is my problem. I double-checked ‘vocab’ and think that is all right. I’m thinking that the problem is here:

for word in vocab:
    # get the positive and negative frequency of th
    if freqs.get((word, 1.0), 0.) != 0:
        freq_pos = freqs.get((word, 1.0))
    if freqs.get((word, 0.0), 0.) != 0:
        freq_neg = freqs.get((word, 0.0))

I played around with these lines but I still do not get the error. If somebody can help I’d be greatly thankful.

Well, it seems that the magic occurs as soon as you post your problem. I removed the if statements and now is working all right. Thanks and good luck to everybody.

Yes, that was going to be my suggestion. In the code you actually show, you are working way too hard. The whole point of the get() method on a dictionary is that it handles the case that the key is not found for you.

Hi, Im hitting some issues on the unit test? I get the expected output 9165 passing but then hit the following:

TypeError Traceback (most recent call last)
1 # Test your function
----> 2 w2_unittest.test_train_naive_bayes(train_naive_bayes, freqs, train_x, train_y)

~/work/ in test_train_naive_bayes(target, freqs, train_x, train_y)
369 for key, value in test_case[“expected”][“loglikelihood”].items():
→ 371 if np.isclose(result2[key], value):
372 count_good += 1

<array_function internals> in isclose(*args, **kwargs)

/opt/conda/lib/python3.7/site-packages/numpy/core/ in isclose(a, b, rtol, atol, equal_nan)
2285 y = array(y, dtype=dt, copy=False, subok=True)
→ 2287 xfin = isfinite(x)
2288 yfin = isfinite(y)
2289 if all(xfin) and all(yfin):

TypeError: ufunc ‘isfinite’ not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ‘‘safe’’

You can examine the unit test code by clicking “File → Open” and then opening the file The error message means that the return value of your function is the wrong type. You’ll need to understand the test logic to make sure you know which value is the one that is causing the failure.

Actually you can see enough just in the exception trace that you show:

It looks like at least one of the entries in the loglikelihood dictionary that you return is not a numpy scalar floating point value. Maybe it is NaN.

1 Like