Issue: Wrong number of keys in loglikelihood dictionary

Hi,

I encountered an issue when I was trying to finish the train_naive_bayes function in C1_W2_Assignment. Error information is shown below. It showed that I got the wrong number of keys in the dictionary.

I supposed that the error may be caused by the wrong number of element in vocab. In my answer, vocab was gained from vocab = set([pair[0] for pair in freqs.keys()]), which got all unique words in the freqs dictionary.

I also tried to run vocab = set([pair[0] for pair in freqs.keys()]) in a new code cell, but its output’s length was still 9162 instead of 9165.

I wonder how this issue happened and how I can fix it.

Thanks!

When I took this class a year ago I experimented with running locally. I had a different version of nltk. My output for the number of keys was 9162. My notes at the time say ‘issue is lower casing of some emoji’s as of NLTK 3.4.5’

If you search in the forum, you will find at least one other related thread. If you are running locally, you can try to match the NLTK version running in the Coursera (or Google?) cloud. If you are using the course-provided environment, this discrepancy might be caused by a package update that is breaking the unit test. HTH

1 Like

Thanks for your reply! I tried my code in the course-provided environment and the issue was fixed. :smiling_face_with_three_hearts:

1 Like

Thanks for the feedback and glad you resolved it. For others reading this thread in the future, here is a related one with some more details and examples of what causes these numerical discrepancies…