Wrong number of keys in loglikelihood dictionary

Cossy · December 7, 2022, 9:06am

Hi there,

I’m currently facing a weird error that is contradicting itself. I suspect it’s a defect in the grading system. Does anyone face a similar problem or know the solution?

Specifically, my train_naive_bayes() function outputs the correct number of keys when I try to check it up but in the w2_unittest.test_train_naive_bayes(), it outputs a different number of keys that is incorrect. How could this happen?

Thanks in advance!

Yuki

Cossy · December 7, 2022, 9:09am

Btw, I doubt it’s a problem of loglikelihood implementation since it’s just simple one equation you have to implement. I assume something is wrong with freq_pos/neg or vocab.

Elemento · December 7, 2022, 11:03am

Hey @Cossy,
Can you please check this thread out once? Let us know if this resolves your query.

Cheers,
Elemento

Cossy · December 8, 2022, 12:42am

Hi @Elemento ,

Thank you for the reference! I solved it myself. I was generating vocab from train_x but instead tried generating it from freqs and it worked!
I’m not sure why it didn’t work, though, since they both would create vocab in the same way but I’m glad it passed now anyway.
Thank you for your quick comment!

Yuki Cossy

Elemento · December 8, 2022, 4:53am

Hey @Cossy,
Thanks for letting us know that you were able to pass the assignment. I will be doing this assignment today, and will let you know if I come across anything relevant.

Cheers,
Elemento

Elemento · December 8, 2022, 8:05am

Hey @Cossy,
If you try to print train_x which has been passed into the function train_naive_bayes, you will find that it is a list of raw tweets (not pre-processed yet). So, in order to build vocab from train_x, you will have to pre-process each of the tweets once again, which has already been done once while building the freqs dictionary.

Thus, it only makes sense to build vocab from the freqs dictionary, as they have also mentioned in the instructions (implicitly). Now, you would have pre-processed the tweets, and it might have worked during the public test-cases, but in the hidden test-cases, train_x might have been something different or say only a subset of the entire training set. And hence, your vocab got built incompletely, which leads to further errors down the function.

I hope this helps.

Cheers,
Elemento

Topic		Replies	Views
Issue: Wrong number of keys in loglikelihood dictionary NLP with Classification and Vector Spaces week-2 , week-3	3	621	March 22, 2023
Part 2: Train Naive Bayes (failing check) NLP with Classification and Vector Spaces week-2 , week-3	4	766	March 22, 2024
C1_W2_Assignment loglikelihood wrong value NLP with Classification and Vector Spaces week-2 , assignment	17	120	March 15, 2025
Doubt in Week 2 coding assignments NLP with Classification and Vector Spaces week-2	9	108	October 22, 2024
I can't find any solution NLP with Classification and Vector Spaces week-2 , week-3	2	543	October 7, 2022

Wrong number of keys in loglikelihood dictionary

Related topics