Why P(w | pos)?

I am looking at week 2 (specifically https://www.coursera.org/learn/classification-vector-spaces-in-nlp/lecture/1ODdZ/testing-naive-bayes ) and as I was watching the video, it came to my mind this question.

Why do we care about P(word | positive) ? Why don’t we care about P(positive | word) instead? The probability that a word is positive given that it is that word. It seems more reasonable.

Thank you,



At time 1:15 in https://www.coursera.org/learn/classification-vector-spaces-in-nlp/lecture/bJXYZ/applications-of-naive-bayes , we see P(spam | email) / P(nonspam | email), which is using probabilities like I describe above which make more sense