I am looking at week 2 (specifically https://www.coursera.org/learn/classification-vector-spaces-in-nlp/lecture/1ODdZ/testing-naive-bayes ) and as I was watching the video, it came to my mind this question.
Why do we care about P(word | positive) ? Why don’t we care about P(positive | word) instead? The probability that a word is positive given that it is that word. It seems more reasonable.
Thank you,
Kovkev