I keep running across a key error when running the train_naive_bayes() function.
Don’t know if you can see the image, but it seems like it cannot find the value
(‘noo’, 1) but (‘noo’,0) will be found in freq dictionary. It seems to me like in the count_tweets() function that creates the dictionary we should be defining both (word,1) and (word,0) in the dictionary when a new word is found (in this case ‘noo’) even if we set one of those equal to zero. But when I try that the unit test for that part of the assignment fail.
My workaround was to create a try/except block to assign freq_pos/freq_neg to zero when this KeyError is thrown, but then the unit tests fail saying wrong value for log likelihood dictionary.
Not every word will have both sentiment values. Your code needs to handle that. The “get()” method on a dictionary is a nice clean way to deal with potentially missing keys. Or you can use “if” clauses.
It looks like that’s not the only issue in your code, e.g. why are you dividing the frequencies by D_pos
and D_neg
? Where does it say to do that?
Ah yes I was not aware of the get() method for python dictionaries. That is very useful. And yes I was thinking of a D_pos and D_neg were probabilities when they are just counts. But seems like the same unit tests are still failing even after I correct for that.
Okay I figured it out. I was using D_pos and D_neg in the log likelihoods instead of N_pos and N_neg. Need to pay attention to the differences in those counts. Thank you for your quick response.
1 Like