Why do we multiply by the odds of positive tweets in the formula of Bayes' classifier?

melhamamsy · February 4, 2022, 12:04pm

Why do we multiply by the odds of positive tweets? The reason I’m asking this question is that this multiplication takes into account the possibility of imbalanced data classes, however, the odds of the likelihood of each word given a certain class already takes into account this effect. For instance, if we had 99 negative reviews and only one positive review, the word happy will most likely appear in the negative reviews (following a ‘not’ word ofc), so let’s say that it appeared 10 times in the negative reviews while only once in the positive reviews the probability of the word given negative review will be 10 over the sum of all the negative words which is equal to 10 / (99 * the average number of words per review). On the other hand, the probability of the word given a positive review will be equal to one over the number of words in that review, which makes its likelihood become much higher. Is the idea here to lower the likelihood of the word given a positive tweet? However, we are only doing multiplication once whereas the effect appears in each word in the positive review. If we were talking about joint probability shouldn’t the prior have been powered by the number of words in the text? However, this is not the case.

jackliu333 · February 12, 2022, 11:06pm

The prior adjustment can be thought of as a probability at the corpus level, which is similar to the topic modeling where we account for the probability of words in sentences and the probability of sentences in the whole document.

Another view is that we are conditioning on the probability of the overall occurrence of being positive or negative.

melhamamsy · March 19, 2022, 7:11am

Thank you @jackliu333.
I just noticed your reply, my apologies for my late reply.

You made your point clear and it makes sense for me to think about it this way.

I’m just thinking of an extreme case using the same example I provided above, that we have 99 negative reviews and only one positive review. Each review is 10 words in length. Word happy featured ten times in the negative reviews (P(happy|neg) = 10/(99*10) = 1/99), whereas, it featured only once in the positive review making P(happy|pos) = 1/10.
Let’s say we received a new review which is ‘happy!’ and we remove punctuation. Without prior adjustment, the odds would be (1/10)/(1/99) = 9.9, this is greater than 1 and makes us predict this review as a positive review. With the prior adjustment, we multiply this by 1/99, which results in 1/10 which is less than 1, and, we predict the review to be negative. What I’m trying to understand using this example is: is the prior adjustment used to give a higher weight to the major class and not to the minor class?

Topic		Replies	Views
Why log prior is calculated like that? NLP with Classification and Vector Spaces week-module-2 , week-module-3	2	524	October 14, 2022
Prior Ratio advantage in unbalanced datasets NLP with Classification and Vector Spaces week-module-2 , week-module-3	1	555	January 12, 2022
Have I correctly understood the Naïve Bayes' inference formula? NLP with Classification and Vector Spaces week-module-2	8	252	March 26, 2024
A question about Naive Bayes score NLP with Classification and Vector Spaces week-module-2 , week-module-3	2	530	August 2, 2022
Week 2, video: Training Naïve Bayes NLP with Classification and Vector Spaces week-module-2 , week-module-3	2	520	November 29, 2022

Why do we multiply by the odds of positive tweets in the formula of Bayes' classifier?

Related topics