Naive Bayes : Test data seems to be incorrect

Natural Language Processing with Classification and Vector Spaces - Sentiment Analysis with Naïve Bayes - Week 2 | Coursera
the unit test cases :

  1. w2_unittest.test_naive_bayes_predict(naive_bayes_predict)
  2. w2_unittest.test_train_naive_bayes(train_naive_bayes, freqs, train_x, train_y)
    refer to the same loglikelihood_test.pkl file in ./support_files folder.
    but the tests dont pass due to the data in loglikelihood_test.pkl . i believe #2 test case above compare the loglikelihood numbers by itself w/o adding log prior value. and passes if your implementation of train_naive_bayes doesnt add logprior to loglikelihood.

but the the unit test cases for test_naive)bayes_predict fail .

in summary - both unit test cases look for expected values of loglikelihood in the same loglikehood_test.pkl file - but only one passes .

Has anyone else faced the same issue?
Note: I have confirmed this hypotheses by actually changing the loglikelihood_test.pkl file.

Hi @rchandrasekhar,

First of all, make sure that your code is implemented correctly, as even small mistakes can lead to test case failures after submission.

I also checked other threads in the forum and couldn’t find anyone else reporting this specific issue.

If you’re confident that your implementation is correct and the problem persists, it might be a good idea to tag the course instructors or moderators so they can take a closer look.

Hi @Alireza_Saei Thanks for the suggestion. I cant find Younes or Lukasz to tag here and i doubt if they would have any time at all to look at this. there are no other contacts/names on the course.
Anyway thanks for looking. will continue to look for source of the issue.
Cheers,

1 Like

@rchandrasekhar Hi, few things to note:

  1. Please in the future make sure to be very accurate with regards to which course/assignment you are speaking of. You posted this under the NLP with Probabilistic Models category, when it belongs in Classification and Vector Spaces

  2. I know there is no issue with the test data as I passed the Specialization (including this assignment)

  3. Within an assignment, specify which exercise you are having trouble with.

Reason for the first two: I had to spend about 10 minutes hunting down the assignment to find what you are referring to to only just begin to start to help. Your kindness/accuracy in this regard is much appreciated as it saves us volunteers a great deal of time :slight_smile:.

So, with that in mind and now that I’ve found the assignment, again, what exercise is it you are having trouble with?

There is no problem with the tests. If the tests fail, the solution is not to fix the tests: it is to figure out why your code does not pass the tests.

Please show us your output for one of the sections in which the tests fail. There are many common mistakes that people make in the train_naive_bayes function. The most common one is adding 1 to the frequency for each word, instead of using the result from the freqs dictionary.

I also have issues with w2_unittest.test_train_naive_bayes.
Is it possible that the nltk dataset has changed or some library version has changed?

hi @lethm

Kindly create a fresh post even if you have similar issue, chances are your issue might be not same.

and yes nltk dataset had an updated version, and there was a version issue.

But it is request to create separate thread for your issue, you can tag any common thread which you will feel is similar to your problem.

Make sure you share a screenshot of your error or your different output than expected without sharing any grade cell codes.

Regards
DP

Hi @Nevermnd (Anthony) - thanks for looking into it. Apologies if the issue was not clearly framed/identified.
@Deepti_Prasad - mentioned that there was a version issue with the nltk dataset. maybe the issue is occuring now.
also i am attaching images and hopefully they are aligned with my text
The specific assignment is train_naive_bayes. from my lab :

{moderator edit - solution code removed}

` Note that all other unit tests passed and code for test_naive_bayes,predict, get_ratio, get_words_by_threshold were graded 10/10.
and all these methods depend on the loglikelihood generated in the train method!

image - the score image
The w2_unittest.test_train_naive_bayes(train_naive_bayes, freqs, train_x, train_y) fails 3 test cases.
The log from the unit test run is@

hi @paulinpaloalto - thanks for looking into this. i have responded with more details to @Nevermnd and saw your post after. If you get a chance, do review my code for train_naive_bayes i posted in response to @Nevermnd .

You are making the most common mistake here of just adding 1 when you see the word. Please have a more careful look at the instructions. What is intended is that you add the actual frequency of that word in the appropriate type of tweet (positive or negative), which is expressed by the value in the freqs dictionary.

1 Like

yes, totally missed that - mortified !!
Thanks for your time and patience.

2 Likes

Whew, thanks Paul, I am in the middle of something.

Also, @rchandrasekhar you don’t have to call me Anthony. I mean I guess that’s my name, but that is also why I have had a handle for 20 years (and well, my age-- ‘Nirvana’).

But I was not trying to be hard on you. Maybe the other guys do have the entire course mapped in their brains and memorized, but I have to wrack my head a little to find just the right section you might be talking about if it is not clear.

It just helps out a lot if we just know exactly where to look.

To be honest, the answer to most questions is probably provided in less time then it takes us to ‘hunt’ for what the asker is inquiring about.

1 Like