This is my output for the test cell for UNQ_C3 in that notebook with a few added print statements to help see what is happening:
type(wordlist) <class 'list'>
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165
The number V is the number of unique words in the vocabulary. The instructions recommend using the python function āset()ā to get the unique words that are the keys of the freqs
dictionary that was constructed by the count_tweets
function. Are you sure that earlier function passed the tests for it?
please check my code, I donāt know what I am doing wrong
I tried doing it your way and I still get 9165 words in the vocab
. Maybe there is something wrong with your count_tweets
function, which is what creates the freqs
dictionary that is the input there. But that seems to pass its tests also.
Maybe time to look at your code. Please check your DMs for a message from me.
Hello Paul. Iām having a problem, can you help me please?
{moderator edit - solution code removed}
My bet is that the problem is that your V value is incorrect. Youāve taken the length of the freqs
dictionary, but remember that a lot of words have both a negative and positive frequency. That means that they appear twice in freqs
, so that number is not the size of the vocabulary.
The rest of it looks correct at first glance, although youāre working too hard in how you compute D_pos
and D_neg
. You donāt need a python enumeration to do that. Iām not supposed to just write the code for you, but letās do an example. Suppose I have a vector z full of real numbers and I want to know how many of them are greater than 0.5. Hereās a nice clear way to compute that:
numTrue = np.sum(z > 0.5)
It should be pretty easy to apply that idea to computing D_pos
and D_neg
. Your code looks correct, but itās more complicated than it needs to be. Iām not tuned in on how the python interpreter would translate your code into compiled executable code, but my guess would be its implementation would run slower than using the technique I showed above. Itās the classic difference between a loop and a vectorized expression of the same computation. We could try both and measure the performance. 
Speaking of inefficiency, the computation of V_pos
and V_neg
is just a waste, right? Those values are not used anywhere in the code. Of course the grader here does not care about how fast your code runs, it only checks the correctness of the answers. But in the larger scheme of things, efficiency does still matter. 
Hi Paul,
we meet again!!!
I am very sorry but my freq_pos and freq_neg for smile are 282 and 54.
And when I print: print(freqs[(āsmileā, 1.0)]) and print(freqs[(āsmileā, 0.0)]), I also get 282 and 54. Am I wrong about understanding of freq_pos and freq_neg? Because you quote 47 and 9.
Thank you.
DS
Eh, this is super-funny. I rerun the whole session from scratch and it all works. And agrees with your numbersā¦
I guess, I re-ran some cells and things were adding up⦠??? For example, 282/47=54/9=6 
Hmmm, not sure I can explain the *6 phenomenon, but I stand by the 47 and 9 numbers. 
Yeah, I guess as I ran this and that cell, some back and forth more than once, some internal variables were adding up. So to really check I had to rerun the whole thing from the top. But I did it only because I checked the ratios from my result and your post, so I guess your old post helped me debug 
Hi @paulinpaloalto - i am getting the error:
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
12 Tests passed
3 Tests failed
this is my current code:
{moderator edit - solution code removed}
I canāt understand where it went wrong
How many entries are there in your vocab
list? Here are the numbers I get with some added prints to see what is going on:
type(wordlist) <class 'list'>
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165
Iāll bet that your V value is 11436. So why would that happen? The point is that youāve just taken the word from each key in the freqs
dictionary, but note that quite a few words have both a positive and a negative frequency, so you get duplicate words in the list. Please have another careful look at the instructions: they tell you how to fix that problem.