C3W1 Assignment Error

bernardogam · April 8, 2024, 3:09am

I need some help with this assignment. C3W1 Naive Bayes.

I implemented the get_word_frecuency as described, but test_output and unit test fail.

{CODES REMOVED BY MODERATOR}-AGAINST COMMUNITY GUIDELINES, REFER CODE OF CONDUCT FAQ

How can river be expected to have 2 spam and 3 ham when input emails are only 3? for the test_output?

In unit tests how can word counting for work processed (for example) expect 2 spam 1 ham when it only appears 1 time in the test_case emails???

I am losing my mind. This doesn’t make any sense. Either I’ve been at this too long and missing something or expected outputs and tests are wrong.

week-1
Assignment: C3_W1 Assignment
Math for Machine Learning
Probability & Statistics for Machine Learning &...

(ASSIGNMENT LINK TO THE LAB REMOVED BY THE MODERATOR)-AGAINST COMMUNITY GUIDELINES, REFER CODE OF CONDUCT FAQ

TMosh · April 8, 2024, 3:22am

These should both be 1’s, not 0’s.

Please don’t post your code on the forum. That’s not allowed by the Code of Conduct.

bernardogam · April 8, 2024, 3:25am

I see. If the count is actually 0, what is the reasoning for setting an initial 1?

TMosh · April 8, 2024, 6:18am

The comment on the line above that bit of code says why.

AndreasR · June 21, 2024, 8:06am

Fighting also with this part.

From my point of view initializing it with 0 is correct. As we are looping over the emails it can only be either spam or not spam.
Moreover the next statement is checking for ham and adding +1 to word_dict[word[‘ham’] IF TRUE and doing the same for spam adding +1 to word_dict[word[‘spam’] if TRUE

The first test is only having one Spam mail [1,0,0]. So it is not possible to have a word appearing more than once in the spam category. Also the sum of ham and spam per word can not be larger then the number of mails.

##### Expected Output (the output order may vary, what is important is the value for each word).

*** ***{'going': {'spam': 2, 'ham': 1}, 'river': {'spam': 2, 'ham': 3}, 'like': {'spam': 2, 'ham': 1}, 'deep': {'spam': 1, 'ham': 2}, 'love': {'spam': 1, 'ham': 2}, 'hate': {'spam': 1, 'ham': 2}}*** ***

TMosh · June 21, 2024, 3:09pm

Sorry, but i do not have access to the notebook at this time. So i cannot check the details.

paulinpaloalto · June 22, 2024, 8:59pm

The reason for initializing the counts to 1 rather than 0 is explained in detail in the instructions for that section. We are computing the product of the probabilities of each word in a given email appearing in a spam (or ham in the other case) email. But what if there is a single word in the email that literally appears in 0 spam emails in the particular training corpus that you are working with? Then the whole product value gets killed by the zero probability for that one word. This is a strategy to avoid that scenario. This may seem like a bit of a “hack”, but the point is that the training corpus is limited and there is (at least in principle) no single word that can possibly guarantee an email is not spam in every possible scenario. Remember that we’re trying to train a model that will work well on arbitrary input data that it has not been actually trained on. What is a word you can think of such that it is literally impossible to construct a spam email that actually contains that word?

Remember that we’re computing the total frequencies of each word appearing in spam and ham emails in the training corpus.

Maybe I’m not getting your point here, but you’re right that any given mail is one or the other. But remember that some words in the corpus will appear in both spam and ham emails. Some may appear only in one type, but will have a count of 1 for the missing type given the way the instructions told us to compute the counts.

AndreasR · June 24, 2024, 7:50am

Thank you for the explanation.

Topic		Replies	Views
C3W1_Assignment - Issue with Naive Bayes Assignment Unit Tests Probability & Statistics for Machine Learning &... week-module-1	10	211	January 10, 2025
C3W1_Assignment - The count of expected output not matching Probability & Statistics for Machine Learning &... week-module-1	2	19	June 12, 2025
C3W1 get_word_frequency function Probability & Statistics for Machine Learning &... week-module-1	6	88	July 15, 2024
C3W1_Assignment \| Exercise 1 \| get_word_frequency Function working but results are off Probability & Statistics for Machine Learning &... week-module-1	6	481	January 6, 2025
C3W1_Assignment There was a problem compiling the code from your notebook Probability & Statistics for Machine Learning &... week-module-1	4	501	April 1, 2024

C3W1 Assignment Error

Related topics