Corpus: I am happy, are you? Yes, I am
Query:
What is the probability of trigram: P(happy | I am) = ?
To my understanding:
C (I am happy) = 1
C (I am) = 1
I assumed C(I am) = 1, because, to make the sequence eligible for trigram count, it has to be followed by some word, isn’t it.?
Since, the “I am” occurring at the end of the corpus is not followed by any other word, thus it may not qualify for a count increment in denominator!
Is my understanding correct??
Or, C (I am) to be used in denominator should be considered 2?