Here are my outputs from the cell that tests train_naive_bayes
:
V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676
0.0
9165
Notice that everything is the same except the actual loglikelihood
value. A difference in the 3rd decimal place is not a rounding error: we’re doing 64 bit floating point here, so rounding errors are 10^{-16} or smaller.
I would start by checking your “order of operations” in that computation and comparing your code carefully to the mathematical formula that we are trying to implement. If the inputs are all correct, then there must be something wrong with that implementation.
Also you’re right that the V_pos
and V_neg
are not really used for anything.