Kindly help me understand what is wrong with my code below.
It fails one of the tests and hence grades are deducted.
def naive_bayes_classifier(text, word_freq=word_freq, class_freq=class_freq):
"""
Implements a naive Bayes classifier to determine the probability of an email being spam.
Args:
text (str): The input email text to classify.
word_freq (dict): A dictionary containing word frequencies in the training corpus.
The keys are words, and the values are dictionaries containing frequencies for 'spam' and 'ham' classes.
class_freq (dict): A dictionary containing class frequencies in the training corpus.
The keys are class labels ('spam' and 'ham'), and the values are the respective frequencies.
Returns:
float: The probability of the email being spam.
"""
# Convert the text to lowercase-only
text = text.lower()
# This is a list containing the words of the email
words = set(text.split())
cumulative_product_spam = 1.0
cumulative_product_ham = 1.0
### START CODE HERE ###
# Iterate over the words in the email
for word in words:
# You should only include words that exist in the corpus in your calculations
if word in word_freq:
# Calculate the probability of the word appearing in a spam email
cumulative_product_spam *= word_freq[word]["spam"]/class_freq["spam"]
# Calculate the probability of the word appearing in a ham email
cumulative_product_ham *= word_freq[word]["ham"]/class_freq["ham"]
# Calculate the likelihood of the words appearing in the email given that it is spam
likelihood_word_given_spam = 1 * cumulative_product_spam
# Calculate the likelihood of the words appearing in the email given that it is ham
likelihood_word_given_ham = 1 * cumulative_product_ham
# Calculate the posterior probability of the email being spam given that the words appear in the email (the probability of being a spam given the email content)
prob_spam = likelihood_word_given_spam / (likelihood_word_given_spam + likelihood_word_given_ham)
### END CODE HERE ###
return prob_spam
The test result:
Probability of spam for email 'enter the lottery to win three million dollars': 100.00%
Probability of spam for email 'meet me at the lobby of the hotel at nine am': 0.00%
Probability of spam for email '9898 asjfkjfdj': 50.00%
Expected Output
Probability of spam for email 'enter the lottery to win three million dollars': 100.00%
Probability of spam for email 'meet me at the lobby of the hotel at nine am': 0.00%
Probability of spam for email '9898 asjfkjfdj': 23.88%