Wrong values for loglikelihood dictionary

This is my output for the test cell for UNQ_C3 in that notebook with a few added print statements to help see what is happening:

type(wordlist) <class 'list'>
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165

The number V is the number of unique words in the vocabulary. The instructions recommend using the python function ā€œset()ā€ to get the unique words that are the keys of the freqs dictionary that was constructed by the count_tweets function. Are you sure that earlier function passed the tests for it?


please check my code, I don’t know what I am doing wrong

I tried doing it your way and I still get 9165 words in the vocab. Maybe there is something wrong with your count_tweets function, which is what creates the freqs dictionary that is the input there. But that seems to pass its tests also.

Maybe time to look at your code. Please check your DMs for a message from me.

Hello Paul. I’m having a problem, can you help me please?

{moderator edit - solution code removed}


My bet is that the problem is that your V value is incorrect. You’ve taken the length of the freqs dictionary, but remember that a lot of words have both a negative and positive frequency. That means that they appear twice in freqs, so that number is not the size of the vocabulary.

The rest of it looks correct at first glance, although you’re working too hard in how you compute D_pos and D_neg. You don’t need a python enumeration to do that. I’m not supposed to just write the code for you, but let’s do an example. Suppose I have a vector z full of real numbers and I want to know how many of them are greater than 0.5. Here’s a nice clear way to compute that:

numTrue = np.sum(z > 0.5)

It should be pretty easy to apply that idea to computing D_pos and D_neg. Your code looks correct, but it’s more complicated than it needs to be. I’m not tuned in on how the python interpreter would translate your code into compiled executable code, but my guess would be its implementation would run slower than using the technique I showed above. It’s the classic difference between a loop and a vectorized expression of the same computation. We could try both and measure the performance. :nerd_face:

Speaking of inefficiency, the computation of V_pos and V_neg is just a waste, right? Those values are not used anywhere in the code. Of course the grader here does not care about how fast your code runs, it only checks the correctness of the answers. But in the larger scheme of things, efficiency does still matter. :grinning:

Hi Paul,
we meet again!!!
I am very sorry but my freq_pos and freq_neg for smile are 282 and 54.
And when I print: print(freqs[(ā€˜smile’, 1.0)]) and print(freqs[(ā€˜smile’, 0.0)]), I also get 282 and 54. Am I wrong about understanding of freq_pos and freq_neg? Because you quote 47 and 9.
Thank you.
DS

Eh, this is super-funny. I rerun the whole session from scratch and it all works. And agrees with your numbers…
I guess, I re-ran some cells and things were adding up… ??? For example, 282/47=54/9=6 :smile:

Hmmm, not sure I can explain the *6 phenomenon, but I stand by the 47 and 9 numbers. :grinning:

Yeah, I guess as I ran this and that cell, some back and forth more than once, some internal variables were adding up. So to really check I had to rerun the whole thing from the top. But I did it only because I checked the ratios from my result and your post, so I guess your old post helped me debug :grin:

Hi @paulinpaloalto - i am getting the error:
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
12 Tests passed
3 Tests failed

this is my current code:

{moderator edit - solution code removed}

I can’t understand where it went wrong

How many entries are there in your vocab list? Here are the numbers I get with some added prints to see what is going on:

type(wordlist) <class 'list'>
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165

I’ll bet that your V value is 11436. So why would that happen? The point is that you’ve just taken the word from each key in the freqs dictionary, but note that quite a few words have both a positive and a negative frequency, so you get duplicate words in the list. Please have another careful look at the instructions: they tell you how to fix that problem.

thank you!