I’m currently facing a weird error that is contradicting itself. I suspect it’s a defect in the grading system. Does anyone face a similar problem or know the solution?
Specifically, my train_naive_bayes() function outputs the correct number of keys when I try to check it up but in the w2_unittest.test_train_naive_bayes(), it outputs a different number of keys that is incorrect. How could this happen?
Btw, I doubt it’s a problem of loglikelihood implementation since it’s just simple one equation you have to implement. I assume something is wrong with freq_pos/neg or vocab.
Thank you for the reference! I solved it myself. I was generating vocab from train_x but instead tried generating it from freqs and it worked!
I’m not sure why it didn’t work, though, since they both would create vocab in the same way but I’m glad it passed now anyway.
Thank you for your quick comment!
Hey @Cossy,
Thanks for letting us know that you were able to pass the assignment. I will be doing this assignment today, and will let you know if I come across anything relevant.
Hey @Cossy,
If you try to print train_x which has been passed into the function train_naive_bayes, you will find that it is a list of raw tweets (not pre-processed yet). So, in order to build vocab from train_x, you will have to pre-process each of the tweets once again, which has already been done once while building the freqs dictionary.
Thus, it only makes sense to build vocab from the freqs dictionary, as they have also mentioned in the instructions (implicitly). Now, you would have pre-processed the tweets, and it might have worked during the public test-cases, but in the hidden test-cases, train_x might have been something different or say only a subset of the entire training set. And hence, your vocab got built incompletely, which leads to further errors down the function.