C3_W1_Assignment_Exercise 10

Kindly help me understand what is wrong with my code below.
It fails one of the tests and hence grades are deducted.

def naive_bayes_classifier(text, word_freq=word_freq, class_freq=class_freq):
    """
    Implements a naive Bayes classifier to determine the probability of an email being spam.

    Args:
        text (str): The input email text to classify.
        
        word_freq (dict): A dictionary containing word frequencies in the training corpus. 
        The keys are words, and the values are dictionaries containing frequencies for 'spam' and 'ham' classes.

        class_freq (dict): A dictionary containing class frequencies in the training corpus. 
        The keys are class labels ('spam' and 'ham'), and the values are the respective frequencies.

    Returns:
        float: The probability of the email being spam.

    """

    # Convert the text to lowercase-only
    text = text.lower()

    # This is a list containing the words of the email 
    words = set(text.split())
    
    cumulative_product_spam = 1.0
    cumulative_product_ham = 1.0
    
    ### START CODE HERE ###
    
    # Iterate over the words in the email
    for word in words:
        # You should only include words that exist in the corpus in your calculations
        if word in word_freq:
            # Calculate the probability of the word appearing in a spam email
            cumulative_product_spam *= word_freq[word]["spam"]/class_freq["spam"]
            
            # Calculate the probability of the word appearing in a ham email
            cumulative_product_ham *= word_freq[word]["ham"]/class_freq["ham"]
    
     # Calculate the likelihood of the words appearing in the email given that it is spam
    likelihood_word_given_spam = 1 * cumulative_product_spam
    
    # Calculate the likelihood of the words appearing in the email given that it is ham
    likelihood_word_given_ham = 1 * cumulative_product_ham
    
    # Calculate the posterior probability of the email being spam given that the words appear in the email (the probability of being a spam given the email content)
    prob_spam = likelihood_word_given_spam / (likelihood_word_given_spam + likelihood_word_given_ham)
    
    ### END CODE HERE ###
    
    return prob_spam

The test result:

Probability of spam for email 'enter the lottery to win three million dollars': 100.00%

Probability of spam for email 'meet me at the lobby of the hotel at nine am': 0.00%

Probability of spam for email '9898 asjfkjfdj': 50.00%

Expected Output
Probability of spam for email 'enter the lottery to win three million dollars': 100.00%

Probability of spam for email 'meet me at the lobby of the hotel at nine am': 0.00%

Probability of spam for email '9898 asjfkjfdj': 23.88%

likelihood_word_given_spam has been incorrectly defined, think about it a little more. The email is already known to be spam… what is the likelihood that it would contain word?

.
.
.
.
.
.

it should be class_freq[‘spam’ or ‘ham’] * cumulative_product_spam or ham

2 Likes