The error of the grader system? For "Train naive bayes"

Hello there,

I’ve been facing the illogical grader output and can’t figure out why for a long. For the “Train naive bayes” assessment in the Programming Assignment: Naive Bayes, I keep getting 0/10. However, looking at the grader output, it is indeed proving that I’m correct as shown below; the grader output says “Failed default_check. Expected: 9165, but got: 9165…”, which doesn’t make any sense.

I already posted and commented on Discussion Forum but none of the suggestions worked. Also, I contacted the Coursera support team and they said this was not an issue with the Java Console Error. There, they gave me advice that I should contact the lecturer directly on this Discourse.

Therefore, I’m wondering if anyone could solve this grader system error. Thank you for your generous support.



1 Like

The mentors do not have any visibility into how the grader actually works. My guess is that it uses something pretty similar to the unit tests in the notebook. Are you sure that your code passes the unit tests in the notebook for the train_naive_bayes function?

The other general thing that can cause this type of problem is if your code has “side effects” even if it generates the correct answer. E.g. it modifies some global state that later tests depend on. You are passed the freqs dictionary for example and that is a global object. If you modify the contents of that dictionary during your train_naive_bayes function, that may cause this type of error.

See also this thread, which has the same 9165 != 9165 symptom

Have to say I think the practice of mentors ‘fixing’ individual students’ code but not helping the community learn from the errors is sub-optimal. It doesn’t scale and dooms us to reading duplicate threads. Not that everyone uses search (obviously) but why not make it clearly beneficial if they did.

It’s a good point, but note that Mubsi is way more than a mentor: he’s actually on the DLAI staff, so not only can he fix student’s code for them right in their personal notebooks, but he could actually fix the grader to make things more transparent in cases like this. Although from what I’ve heard from other course staff folks, that is not always so easy to do given the limitations of the Coursera frameworks. But at the very least, the grader test cases (which look like they are based on the same framework as the unit tests in the notebook), could perhaps be modified to give an error message that actually gives at least a tiny bit of information about what the problem actually is. As it stands, one is left completely in the dark …

The only clue from the linked thread is that the OP wrote in response I see the difference was the use of set. I did the same via other means. which since it failed the grader probably means it wasn’t truly the same but I digress.

We’ve had this discussion before, ‘why give one person a fish when we could teach the whole community to fish with one example?’ A public explanation is there for posterity and doesn’t require modifying and regression testing any code in the grader frameworks.

@Cossy based on the comment in the linked thread, you should look in your code for whether you are inadvertently creating a collection that allows duplicates or somehow producing a collection that has different behavior or characteristics than if it had been created using Python set(). Let us know what you find?

Absolutely! Please believe me: I am totally on board with giving Lao Tzu “fishing lessons”. It’s just that I didn’t have anything useful to share in this particular instance other than the generalities that I gave above.

My grievance is so not with you @paulinpaloalto, who for years have set the bar for thorough exposition both syntactic and semantic. :bowing_man:

Here is an example of what I mean. NLP Assignment 4 Grading Problem - #13 by Mubsi

The learner actually writes It finally worked. Got full grades. thanks a ton. can you tell me what was the issue? Was the learner scored on submitted code they didn’t write? Was it a temporary system issue? An unhandled edge case in the grader code? Only Mubsi knows. A missed opportunity for digital exhaust that would benefit the broader community of learners and mentors alike.

It’s a salient example for sure. Mubsi when operating in Deus ex Machina mode misses the opportunity for creating a legacy here. That’s the only way this whole operation “scales”. The evidence is pretty strong that having a working search engine here in Discourse gives real value to the “digital exhaust” of previous answers. The contrast with the Coursera forums, where the search engine is only \epsilon > utterly useless, is a pretty stark.

Hello Paul,

Thank you so much for your prompt response! It was actually my fault and turned out that I didn’t pass the unit test.
While I was trying to solve the unit test, I got another question regarding the train_naive_bayes question if you didn’t mind answering. What does V actually mean? Is it the number of the unique words in the entire frequency set, or of the ones in the only pos or neg? In other words, can V change depending on whether you’re trying to calculate in regard to the positive vocab or negative one?
Also, if it’s the unique frequency in the entire dataset, how do you calculate that? The instruction says I had better use the set function but I don’t know how to.

Thank you so much for being patient with me asking such a basic question.


To be more specific, the unit test says I should review the log likelihood function and I suspect it’s being caused by how I calculate V in the question. Here, V is the “the number of unique words that appear in the freqs dictionary”.

A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence

Sets can be created by several means:

  • Use the type constructor: set() , set('foobar')

Like other collections, sets support… len(set)

len(s): Return the number of elements in set s (cardinality of s ).

1 Like

Yes, the meaning of V is the total number of unique words in the entire vocabulary. You start by using a “list comprehension” to extract the first element of all the keys in the freqs dictionary. Of course many (but not all) of the words appear twice there, once with a positive count and once with a negative count, so you need to eliminate any duplicates. I’m sure there’s a laborious way that you could write a loop to do that, but the simplest way is the python “set()” function which ai_curious has pointed out: that is a single function call that magically gives you exactly what you need. Then you compute the number of elements in the resulting list (the return value of set) in order to get V, as ai_curious has also shown how to do.

1 Like

Hi @ai_curious and @paulinpaloalto,

Thank you so much for such a kind reply. I think I’m just misunderstanding how to use set() function but I can’t get a correct result for len(loglikelihood) checking when I use set() to get V.

Therefore, I’m using another way to calculate it. My code seems correct to me but the unit test still returns an error with the loglikelihood function.

Is it possible to receive a suggestion on how to improve my code? Or on how to use a set() function? For your reference, I first tried coding V = len(freqs), then len(set(freqs)) and len(freqs.keys()) but apparently none of them worked.

I’m sorry to keep asking so many questions. I hope you could lend me a hand with this, too.


{moderator edit - solution code removed}

And what length value do you get from that code on the test case in question? Of course there is a lot of other logic in that function to examine as well. Far and away the most popular mistake is just to add one to N_pos and N_neg for each occurrence of the given word, rather than using the actual frequencies from the given freqs dictionary.

Do you mean the length of the loglikelihood function? Then, I passed it as shown below.

Ugh NVM just realized I was writing about the same hint @paulinpaloalto provided above, so +1 to his suggestion to check how you implemented the inline comment

# increment the number of pos/neg words by the count for this word,label pair

That length is right for both vocab and the loglikelihood dictionary, since you get one entry per word in the vocabulary. I added some instrumentation to my code, including showing the computation for one particular word. Check if your values agree:

V = 9165, len(wordlist) 11436
V: 9165, V_pos: 5804, V_neg: 5632, D: 8000, D_pos: 4000, D_neg: 4000, N_pos: 27547, N_neg: 27152
freq_pos for smile = 47
freq_neg for smile = 9
loglikelihood for smile = 1.5577981920239676

Hi @paulinpaloalto and @ai_curious,

Thank you very much for your generous help. After all these days, I finally got to pass all the tests!!
It seems like I wasn’t defining N_pos and N_neg correctly. I was mistaking them as V_pos and V_neg and wrongly defining them as the sum of the unique word from both frequencies, which should’ve been actually defined as V_pos and V_neg instead.
I am so fulfilled and feel like I can finally sleep well for the first time in months. I truly appreciate your genuinely helpful advice!!!


1 Like