Wrong values for loglikelihood dictionary

paulinpaloalto · February 2, 2024, 5:26am

This is my output for the test cell for UNQ_C3 in that notebook with a few added print statements to help see what is happening:

type(wordlist) <class 'list'>
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165

The number V is the number of unique words in the vocabulary. The instructions recommend using the python function “set()” to get the unique words that are the keys of the freqs dictionary that was constructed by the count_tweets function. Are you sure that earlier function passed the tests for it?

Kazeem_Enitan_Bello · February 15, 2024, 9:32pm

please check my code, I don’t know what I am doing wrong

paulinpaloalto · February 15, 2024, 9:54pm

I tried doing it your way and I still get 9165 words in the vocab. Maybe there is something wrong with your count_tweets function, which is what creates the freqs dictionary that is the input there. But that seems to pass its tests also.

Maybe time to look at your code. Please check your DMs for a message from me.

Talha-Bicak · February 20, 2024, 1:56pm

Hello Paul. I’m having a problem, can you help me please?

{moderator edit - solution code removed}

paulinpaloalto · February 20, 2024, 5:11pm

My bet is that the problem is that your V value is incorrect. You’ve taken the length of the freqs dictionary, but remember that a lot of words have both a negative and positive frequency. That means that they appear twice in freqs, so that number is not the size of the vocabulary.

The rest of it looks correct at first glance, although you’re working too hard in how you compute D_pos and D_neg. You don’t need a python enumeration to do that. I’m not supposed to just write the code for you, but let’s do an example. Suppose I have a vector z full of real numbers and I want to know how many of them are greater than 0.5. Here’s a nice clear way to compute that:

numTrue = np.sum(z > 0.5)

It should be pretty easy to apply that idea to computing D_pos and D_neg. Your code looks correct, but it’s more complicated than it needs to be. I’m not tuned in on how the python interpreter would translate your code into compiled executable code, but my guess would be its implementation would run slower than using the technique I showed above. It’s the classic difference between a loop and a vectorized expression of the same computation. We could try both and measure the performance.

Speaking of inefficiency, the computation of V_pos and V_neg is just a waste, right? Those values are not used anywhere in the code. Of course the grader here does not care about how fast your code runs, it only checks the correctness of the answers. But in the larger scheme of things, efficiency does still matter.

Dennis_Sinitsky · March 1, 2024, 1:58am

Hi Paul,
we meet again!!!
I am very sorry but my freq_pos and freq_neg for smile are 282 and 54.
And when I print: print(freqs[(‘smile’, 1.0)]) and print(freqs[(‘smile’, 0.0)]), I also get 282 and 54. Am I wrong about understanding of freq_pos and freq_neg? Because you quote 47 and 9.
Thank you.
DS

Dennis_Sinitsky · March 1, 2024, 2:34am

Eh, this is super-funny. I rerun the whole session from scratch and it all works. And agrees with your numbers…
I guess, I re-ran some cells and things were adding up… ??? For example, 282/47=54/9=6

paulinpaloalto · March 1, 2024, 3:23am

Hmmm, not sure I can explain the *6 phenomenon, but I stand by the 47 and 9 numbers.

Dennis_Sinitsky · March 1, 2024, 3:44am

Yeah, I guess as I ran this and that cell, some back and forth more than once, some internal variables were adding up. So to really check I had to rerun the whole thing from the top. But I did it only because I checked the ratios from my result and your post, so I guess your old post helped me debug

teeefarnee · April 16, 2024, 6:57am

Hi @paulinpaloalto - i am getting the error:
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
Wrong values for loglikelihood dictionary. Please check your implementation for the loglikelihood dictionary.
12 Tests passed
3 Tests failed

this is my current code:

{moderator edit - solution code removed}

I can’t understand where it went wrong

paulinpaloalto · April 16, 2024, 2:54pm

How many entries are there in your vocab list? Here are the numbers I get with some added prints to see what is going on:

type(wordlist) <class 'list'>
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165

I’ll bet that your V value is 11436. So why would that happen? The point is that you’ve just taken the word from each key in the freqs dictionary, but note that quite a few words have both a positive and a negative frequency, so you get duplicate words in the list. Please have another careful look at the instructions: they tell you how to fix that problem.

teeefarnee · April 16, 2024, 3:10pm

thank you!

Topic		Replies	Views
'Wrong values for loglikelihood dictionary.' Error on C1_W2 assignment NLP with Classification and Vector Spaces week-2 , week-3	6	690	May 17, 2023
Can't get past UNC_C2: train_naive_Bayes NLP with Classification and Vector Spaces week-2 , week-3	8	680	December 19, 2022
C1_W2_Assignment: Wrong values for loglikelihood dictionary NLP with Classification and Vector Spaces week-2 , week-3	1	516	December 13, 2023
I can't find any solution NLP with Classification and Vector Spaces week-2 , week-3	2	543	October 7, 2022
C1_W2_exercise2 NLP with Classification and Vector Spaces week-2	14	224	August 6, 2024

Wrong values for loglikelihood dictionary

Related topics