Wrong number of keys in loglikelihood dictionary

Hi there,

I’m currently facing a weird error that is contradicting itself. I suspect it’s a defect in the grading system. Does anyone face a similar problem or know the solution?

Specifically, my train_naive_bayes() function outputs the correct number of keys when I try to check it up but in the w2_unittest.test_train_naive_bayes(), it outputs a different number of keys that is incorrect. How could this happen?

Thanks in advance!

Yuki

Btw, I doubt it’s a problem of loglikelihood implementation since it’s just simple one equation you have to implement. I assume something is wrong with freq_pos/neg or vocab.

Hey @Cossy,
Can you please check this thread out once? Let us know if this resolves your query.

Cheers,
Elemento

Hi @Elemento ,

Thank you for the reference! I solved it myself. I was generating vocab from train_x but instead tried generating it from freqs and it worked!
I’m not sure why it didn’t work, though, since they both would create vocab in the same way but I’m glad it passed now anyway.
Thank you for your quick comment!

Yuki Cossy

Hey @Cossy,
Thanks for letting us know that you were able to pass the assignment. I will be doing this assignment today, and will let you know if I come across anything relevant.

Cheers,
Elemento

Hey @Cossy,
If you try to print train_x which has been passed into the function train_naive_bayes, you will find that it is a list of raw tweets (not pre-processed yet). So, in order to build vocab from train_x, you will have to pre-process each of the tweets once again, which has already been done once while building the freqs dictionary.

Thus, it only makes sense to build vocab from the freqs dictionary, as they have also mentioned in the instructions (implicitly). Now, you would have pre-processed the tweets, and it might have worked during the public test-cases, but in the hidden test-cases, train_x might have been something different or say only a subset of the entire training set. And hence, your vocab got built incompletely, which leads to further errors down the function.

I hope this helps.

Cheers,
Elemento